Apache Cassandra is “an open source,
distributed, decentralized, elastically scalable, highly available,
fault-tolerant, tuneably consistent, column-oriented database that bases
its distribution design on Amazon’s
Dynamo
and its data model on Google’s
Bigtable”
(source: “Cassandra: The Definitive Guide,” O’Reilly Media, 2010, p.
14).
Cassandra is built to store lots of data across a variety of machines
arranged in a ring, in other words scaling horizontally, rather than
vertically.
Data Model
Cassandra is based on a key-value model and it is organized
according to the following concepts:
- Column is a key-value pair.
- Column Family is a set of key-value pairs (columns in
Cassandra’s terminology). They are sorted by their keys. Families
are referenced and sorted by row keys.
- Super Column the value of a key-value pair can be a sequence of
key-value pairs as well. In this case, the outer column would be
called super column.
- Columns and Super Columns can equally be used within Column Families
- Columns or Super Columns are stored ordered by names within their
Column Families
For a better understanding of Cassandra’s data model, refer to this
article
by Maxim Grinev.
Installation
- Download the latest Cassandra version from
here
(I got version 1.2.6).
- Extract the archive (I extracted it to C:\apache-cassandra-1.2.6
)
- If you don’t have Java installed on your machine, go and get it
installed.
- Add environment variables
- Reight-click the my Computer icon on your desktop or start menu.
- Click the Advanced tab (or the Advanced System Settings)
- Under System Variables, click New (adjust for your own
directories)
- Variable Name : CASSANDRA_HOME
- Variable Value : C:\apache-cassandra-1.2.6
- click OK
- Under System Variables, click New (adjust for your own
directories)
- Variable Name : JAVA_HOME
- Variable Value : C:\Program Files\Java\jre7
- click OK
- Now, open command window and navigate to your bin directory inside
Cassandra directory (C:\apache-cassandra-1.2.6\bin in my case).
- Launch Cassandra by executing the comand cassandra –f (the
“-f” causes it to run in the foreground). You will see lots of
messages coming out. If everything goes fine, it will end up with
something like this:
Now we have a running Cassandra server is expecting incoming connections
on port 9160.
Once Cassandra is up and running on your machine, we can connect to the
running instance using the Cassandra command-line interface, launched by
running “cassandra-cli.bat”, from the Cassandra “bin” directory.
Commads
- show api version; to show the current api version.
- describe cluster; to show a description of the current cluster.
- create keyspace TestKS; to create a keyspace and it have to be
a unique name.
- use TestKS; to switch to keyspace TestKS.
- create column family TestCF; to create a column family TestCF
within the current keyspace.
- No other schema definition is required, the column family is a
collection of name/value pairs.
set TestCF[ascii('TestKey')][ascii('column1')]=ascii('TestValue');
to
insert the TestKey/TestValue key/value pair into the column named
column1 within the column family TestCF. You can use the with ttl
= x setting at the end of the set command to make the column
self-delete aft x seconds of the insertion time.
- by default Cassandra treats data as byte arrays but you can
convert to other types such as Long, int, integer,….
Also timeuuid() generates new UUID. For full information
about set command: type help set;
get TestCF[ascii("TestKey")]; to retrieve the value stored in the key TestKey within the column family TestCF
- del TestCF[ascii(‘TestKey’)][ascii(‘column2’)]; rows and
columns can be deleted by specifying the row key and/or the column
name
with the del (delete) command.
- list TestCF; list the data inside a column family
- drop column family TestCF; removes a column family.
- drop keyspace TestKS; removes a key space.
==> You can insert to super columns much like inserting to normal
columns. They can be read with get, written with set, and deleted with
del. The super column version of these commands uses an extra [‘xxx’]
to represent the extra sub-index level.
- assume TestCF comparator as ascii; it decodes and helps
display results of get and list requests inside the command-line
interface. It can be used in the same way to set the validator
and keys. By default, columns with no metadata are displayed in
a hex format. This is done because row keys, column names, and
column values are byte arrays. After using assume the column and
value will be displayed rather than the hex code.
- Type Enforcement : Cassandra is designed to store and retrieve
simple byte arrays but it also have support for built-in
types.
When creating or updating a column family, the user can supply
column metadata that instructs the CLI Cassandra client on how to
display data and help the server enforce types during insertion
operations.
create column family User with comparator = UTF8Type;
update column family User with
column_metadata =
[
{column_name: first, validation_class: AsciiType},
{column_name: last, validation_class: AsciiType},
{column_name: age, validation_class: IntegerType,
index_type: KEYS}
];
- Querying data :
set User[ascii(‘jsmith’)][ascii(‘first’)] = ascii(‘John’);
set User[ascii(‘jsmith’)][ascii(’last’)] = ascii(‘Smith’);
set User[ascii(‘jsmith’)][ascii(‘age’)] = ‘38’;
get User where age = ‘38’;
- Update is the same as set.
set User[ascii(‘jsmith’)][ascii(‘first’)] = ascii(‘Jack’);
quit;
In this post we just touched the Cassandra’s iceberg. In later posts we
will dig more in it and how to write .net programs against it.