Max's Output

A Quick Introduction to the Cassandra Data Model

with 33 comments

Further reading: for an in-depth introduction see Understanding the Cassandra Data Model at datastax.com

For newcomers Cassandra data model is a mess. Even experienced database developers spend quite a bit of time learning it. There are great articles on the Web that explain the model. Read WTF is a SuperColumn? An Intro to the Cassandra Data Model and my favorite one – Installing and using Apache Cassandra With Java. This blog post is my take to explain Cassandra model to those who would like to understand the key ideas in 15 minutes or less.

In a nutshell, Cassandra data model can be described as follows:

1) Cassandra is based on a key-value model

A database consists of column families. A column family is a set of key-value pairs. I know the terminology is confusing but so far it is just basic key-value model. Drawing an analogy with relational databases, you can think about column family as table and a key-value pair as a record in a table.

2) Cassandra extends basic key-value model with two levels of nesting

At the first level the value of a record is in turn a sequence of key-value pairs. These nested key-value pairs are called columns where key is the name of the column. In other words you can say that a record in a column family has a key and consists of columns. This level of nesting is mandatory – a record must contain at least one column (so in the first point above value of a record was an intermediate notion as value is actually a sequence of columns).

At the second level, which is arbitrary, the value of a nested key-value pair can be a sequence of key-value pairs as well. When the second level of nesting is presented, outer key-value pairs are called super columns with key being the name of the super column and inner key-value pairs are called columns.

3) The names of both columns and super columns can be used in two ways: as names or as values (usually reference value).

First, names can play the role of attribute names. For example, the name of a column in a record about User can be Email. That is how we used to think about columns in relational databases.

Second, names can also be used to store values! For example, column names in a record which represent Blog can be identifiers of the posts of this blog and the corresponding column values are posts themselves. You can really use column (or super column) names to store some values because (a) theoretically there is no limitation on the number of columns (or super columns) for any given record and (b) names are byte arrays so that you can encode any value in it.

4) Columns and super columns are stored ordered by names.

You can specify sorting behavior by defining how Cassandra treats the names of (super) columns (recall that a name is just an byte array). Name can be treated as Bytes Type, Long Type, Ascii Type, UTF8 Type, Lexical UUID Type, Time UUID Type.

So now you know everything you need to know. Let’s consider an classical :) example of Twitter database to demonstrate the points.

Column family Tweetscontains records representing tweets. The key of a record is of Time UUID type and generated when the tweet is received (we will use this feature in User_Timelines column family below). The records consist of columns (no super columns here). Columns simply represent attributes of tweets. So it is very similar to how one would store it in a relational database.

The next example is User_Timelines (i.e. tweets posted by a user). Records are keyed by user IDs (referenced by User_ID columns in Tweets column family). User_Timelines demonstrates how column names can be used to store values – tweet IDs in this case. The type of column names is defined as Time UUID. It means that tweets IDs are kept ordered by the time of posting. That is very useful as we usually want to show the last N tweets for a user. Values of all columns are set to an empty byte array (denoted “-”) as they are not used.

To demonstrate super columns let us assume that we want to collect statistics about URLs posted by each user. For that we need to group all the tweets posted by a user by URLs contained in the tweets. It can be stored using super columns as follows.

In User_URLs the names of the super columns are used to store URLs and the names of the nested columns are the corresponding tweet IDs.

Important note: currently Cassandra automatically supports indexes for column names but does not support indexes for the names of super columns. In our example it means that you cannot efficiently retrieve/update tweet ids by URL.

[Update: The above note is incorrect! It is subcolumn names that are not indexed inside super columns. Supercolumn names are always indexed. It is a great news as it enables the use-case of data denormalization to speed up queries. For more on this, find the first comment by Jonathan Ellis below. I cover denormalization use-cases in my next post.]

Let me know if I missed anything or something is unclear.

About these ads

Written by maxgrinev

July 9, 2010 at 9:52 pm

Posted in Uncategorized

33 Responses

Subscribe to comments with RSS.

  1. Hi Maxim,

    This is an excellent post!

    A couple clarifications:

    - if you don’t have anything relevant to store in the column value, leaving it an empty byte array is fine

    - it is _subcolumn_ names that are not indexed inside supercolumns, meaning that you should only store data inside supercolumns that you plan to access together. In your example the supercolumns would be fine. Top level columns (the supercolumn’s name) are always indexed.

    - The most common use case for supercolumns is for denormalizing data from another columnfamily, e.g. in user_timelines you could make the tweet id a supercolumn name, with subcolumns of the actual tweet field names + values. This makes reads still more efficient, since you don’t have to perform joins manually via multiget at read time.

    Jonathan Ellis

    July 13, 2010 at 3:25 am

  2. [...] A Quick Introduction to the Cassandra Data Model « Max’s Output – August 4th %(postalicious-tags)( tags: cassandra nosql data model tutorial database intro )% [...]

  3. Since each record of User_URLs has a collection of super columns, does that make it a super column family? Or am I misunderstanding the distinction between column families and super column families?

    Carlos Macasaet

    August 16, 2010 at 5:31 am

  4. Great post Maxim! Do you have the direct experience with using it in enterprise model?

    Michael

    September 8, 2010 at 2:19 pm

    • Not yet in enterprise settings. We are successfully using this approach for our Web/social applications.

      maxgrinev

      September 8, 2010 at 3:19 pm

  5. [...] Page na Apache Tipos de Dados no Cassandra Introdução ao Modelo de Dados Cassandra Mais um tutorial sobre o modelo de dados [...]

  6. [...] For quick introduction of Cassandra Data Model : Cassandra Data Model of Max Version [...]

  7. [...] besitzt ein relativ einfaches Datenmodell (siehe auch hier). Dies ist eine Mischung aus einem Key-Value-Store und einer spaltenorientierten Datenbank. Die [...]

  8. Great post! I’ve been trying to understand this model for a few weeks now.. Getting closer… much closer.

  9. [...] A Quick Introduction to the Cassandra Data Model [...]

  10. Hi, I found your examples to be very helpful. What I’d really like to see is the way you created the schema and maybe even a few data insert examples that you’d do in cassandra-cli. So basically the CF creation syntax, and then a few inserts/gets for each example. Thanks!

    Brett Nemeroff

    May 4, 2012 at 3:32 pm

  11. My programmer is trying to persuade me to move to .net from PHP.
    I have always disliked the idea because of the costs. But he’s tryiong none the less. I’ve been using
    WordPress on various websites for about a year and am concerned about switching to another platform.
    I have heard fantastic things about blogengine.net. Is there a way I can import all my wordpress posts into it?
    Any kind of help would be greatly appreciated!

    Magda

    August 10, 2012 at 6:34 pm

  12. [...] So now that you have a basic understanding, I’d strongly suggest you to read the official explanation from Cassandra’s wiki and read other good explanations, e.g. A Quick Introduction to the Cassandra Data Model. [...]

  13. [...] So now that you have a basic understanding, I’d strongly suggest you to read the official explanation from Cassandra’s wiki and read other good explanations, e.g. A Quick Introduction to the Cassandra Data Model. [...]

  14. how to auto increment the row key value in cassandra database ?

    Tejinder Singh

    November 6, 2012 at 12:18 pm

    • You don’t. (That would require expensive cross-node coordination and hurt performance a great deal.) Use UUIDs instead.

      Jonathan Ellis

      November 7, 2012 at 6:04 pm

  15. This link at the top: WTF is a SuperColumn? An Intro to the Cassandra Data Model
    takes me to tumblr. I set up an account, but then I get: Access Denied. Do I need to join something else?

    Michael Halpin

    December 20, 2012 at 8:30 pm

  16. Thank you a lot for providing individuals with remarkably superb chance to read critical reviews from this web site. It is usually so excellent and as well , full of amusement for me personally and my office fellow workers to search the blog particularly three times a week to see the newest tips you have. And definitely, I am always happy for the awesome advice served by you. Certain 4 tips in this posting are in reality the most beneficial we have had.

    trading software india

    January 9, 2013 at 11:19 am

  17. Hello! Someone in my Facebook group shared this site with us so I came to
    look it over. I’m definitely loving the information. I’m bookmarking
    and will be tweeting this to my followers! Outstanding blog and great design.

    male extender

    May 13, 2013 at 11:01 am

  18. I need to to thank you for this good read!! I absolutely enjoyed every little bit of it.
    I have you book marked to look at new things you post…

    xbox 360 console 4gb

    June 20, 2013 at 3:46 am

  19. A person essentially lend a hand to make critically articles
    I might state. That is the very first time I frequented your web page
    and so far? I amazed with the analysis you made to make
    this actual submit extraordinary. Magnificent
    activity!

    low Carb Diet

    July 11, 2013 at 3:45 am

  20. […] Intro to Cassandra data model […]

  21. Many headaches are a result from not enough water in the body.
    Try and work out exactly what you need from your paleo diet supplements and
    only aim to get what you actually, really need. If you discover that your doctor is not open minded about
    dietary supplementation in general you may need to seek a second opinion.

  22. It is really frustrating how this skin dilemma can totally
    change how you look. This helps the slugging off of dead skin cells in
    an accurate pace while being replaced by new healthy ones.
    It’s rich in a number of important nutrients, including calcium, iron and potassium, but it also contains B-group vitamins, which soothe inflammation.

    Naomi

    July 21, 2013 at 7:31 am

  23. However, if you are getting something made out of gold with diamonds
    on it, you want to produce sure you’re getting top quality. He also made television appearances on shows including Crossfire and also the Capital Gang on CNN and became a frequent guest on these news programs. In 1992 Pat Buchanan ran inside the Republican primary for president over a platform of immigration reduction, social conservatism, and opposition to gay rights and abortion.

    Rosalyn

    July 27, 2013 at 2:10 am

  24. Everything is very open with a precise explanation of the challenges.
    It was really informative. Your site is very useful. Many thanks for
    sharing!

    games

    November 27, 2013 at 9:34 pm

  25. I will immediately seize your rss feed as I can’t find your email subscription link or newsletter service.
    Do you’ve any? Please permit me know so that I
    may just subscribe. Thanks.

    sylwester 2014

    December 5, 2013 at 11:47 am

  26. Thank you for your clearly written explanations. Also the graphics you are using are very helpful to get a better overview. Actually I am still having a hard time with the super coloums. So I think I’ll read through your manual a second time, but later. Whenever my colleagues mentioned the Cassandra data model to me I thought it would take much more time to get used to it. But actually I think that it’s not that complicated after you learnt the basic structure. This article definitely helped me.

    Mike

    December 21, 2013 at 12:31 am

  27. Even though today you can find out by making the investment into a search engine.

    The list of search engine optimization tool that is popular among web marketers is Title Tag Checker.
    In today’s World of Search, more and more sites are clambering to optimize their rankings in
    websites and if you use keywords that are relevant
    to new articles. The major benefits of all this hassle is that you will lose free website
    search engine optimization everything.

    St. Louis SEO Agency

    March 31, 2014 at 7:22 am

  28. Now I am going to do my breakfast, afterward having my breakfast coming again to read more news.

    professional

    April 6, 2014 at 5:03 pm

  29. I loved as much as you will receive carried out right here.
    The sketch is tasteful, your authored subject matter stylish.
    nonetheless, you command get bought an edginess over that you wish be delivering the
    following. unwell unquestionably come further formerly again since exactly
    the same nearly very often inside case you shield this increase.

    http://www.gamezebo.com

    April 11, 2014 at 7:11 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: