Max's Output

Do You Really Need SQL to Do It All in Cassandra?

with 84 comments

NoSQL database systems are designed for scalability. The down side of that is a primitive key-value data model and, as the name suggest, no support for SQL. It might sound like a serious limitation – how can I “select”, “join”, “group” and “sort” the data? This post explains how all these operations can be quite naturally and efficiently implemented in one of the most famous NoSQL system – Cassandra.

To understand this post you need to know the Cassandra data model. You can find a quick introduction in my previous post. The power of the Cassandra data model is that it extends a basic key-value store with efficient data nesting (via columns and super columns). It means that you can read/update a column (or a super column) without retrieving the whole record. Below I describe how we can exploit data nesting to support various query operations.

Let’s consider a basic example: departments and employees with one-to-many relationships respectively.  So we have two column families: Emps and Deps. In Emps employee IDs are used as keys and there are Name, Birthdate, and City columns. In Deps keys are department IDs and the single column is Name.

1) Select
For example: select * from Emps where Birthdate = '25/04/1975'
To support this query we need to add one more column family named Birthdate_Emps in which key is a date and column names are IDs of those employees that were born on the date. The values are not used here and can be an empty byte array (denoted “-”). Every time when a new employee is inserted/deleted into/from Emps we need to update Birthdate_Emps. To execute the query we just need to retrieve all the columns for the key '25/04/1975' from Birthdate_Emps.

Notice that Birthdate_Emps is essentially an index that allows us to execute the query very efficiently. And this index is scalable as it is distributed across Cassandra nodes. You can go even further to speed up the query by redundantly storing information about employees (i.e. employee’s columns from Emps) in Birthdate_Emps. In this case employee IDs becomes names of super columns that contain corresponding employee columns.

2) Join
For example: select * from Emps e, Deps d where e.dep_id = d.dep_id
What does join essentially mean? It constructs records that represent relationship between entities. Such relationships can be easily (and even more naturally) represented via nesting. To do that add column family Dep_Emps in which key is a department ID and column names are IDs of the corresponding employees.

3) Group By
For example: select count(*) from Emps group by City
From implementation viewpoint Group By is very similar to select/indexing described above. You just need to add a column family City_Emps with cities as keys and employee IDs as column names. In this case you will count the number of employees on retrieval. Or you can have a single column named count which value is the pre-calculated number of employees in the city.

4) Order By
To keep data sorted in Cassandra you can use two mechanisms: (a) records can be sorted by keys using OrderPreservingPartitioner with range queries (more on this in Cassandra: RandomPartitioner vs OrderPreservingPartitioner). To keep nested data sorted you can use automatically supported ordering for column names.

To support all these operations we store redundant data optimized for each particular query. It has two implications:
1) You must know queries in advance (i.e. no support for ad-hoc queries). However, typically in Web applications and enterprise OLTP applications queries are well known in advance, few in number, and do not change often. Read Mike Stonebraker convincingly talking about that.  BTW, Constraint Tree Schema, described in the latter paper, also exploits nesting to organize data for predefined queries.

2) We shift the burden from querying to updating because what we essentially do is supporting materialized views (i.e. pre-computed results of queries). But it makes a lot of sense in case of using Cassandra as Cassandra is very much optimized for updates (thanks to eventual consistency and “log-structured” storage borrowed from Google BigTable). So we can use fast updates to speed up query execution. Moreover, use-cases typical for social applications are proven to be only scalable with push-on-change model (i.e. preliminary data propagation via updates with simple queries – the approach taken in this post) in comparison with pull-on-demand model (i.e. data are stored normalized and combined by queries on demand – classical relational approach). On push-on-change versus pull-on-demand read WHY ARE FACEBOOK, DIGG, AND TWITTER SO HARD TO SCALE?

About these ads

Written by maxgrinev

July 12, 2010 at 12:56 am

Posted in Cassandra

84 Responses

Subscribe to comments with RSS.

  1. Great post Max.

    The examples are very useful and clear. What happens when you need something not so trivial, say that you need to do not just a join on e.dep_id = d.dep_id, but on some other values. What does one do then? Could you please give an example of that, so one can see how one “scales” the modeling further? Thanks.

    Otis Gospodnetic

    July 16, 2010 at 6:26 pm

    • Thank you!

      I just did not want to make the post too long. Answering your question, the approach seems to scale quite well by combining the above techniques. For example, suppose that you want to query select from Deps d, Emps e where d.dep_id=e.dep_id and dep.located=’Bern’. You can create a column family that maps cities to the IDs of the employees who work in the city. In general the aproach should work fine for ‘equal’ predicates. It is not a perfect solution for ‘range’ predicates as range scan over keys is not quite efficient.


      July 22, 2010 at 10:21 am

      • Thanks for your posts Max. I’m starting to get a better grasp of how to structure data in Cassandra based on the logical (normalised) data model and the queries that will be performed. However, I’m still not sure about how to structure the data for queries against multiple attributes.

        In your example, you created indices for employee birthdays and employee cities. What if I wanted to find all the employees born in 1975 currently in San Francisco? Can such an index be built or would the query need to be performed against the emps column family?


        You can build sort of compound index. A key in this index is the concatenation of the year of birth and the city of current location and each record contains all employees born in this year and currently located in this city.

        Carlos Macasaet

        August 16, 2010 at 6:18 am

    • That’s a knowing answer to a diffcilut question


      August 9, 2012 at 7:43 pm

  2. […] Similar to indexes, materialized views store data redundantly. So they also can be supported via triggers. Find general guidelines on managing redundant data (indexes and materialized views) in Cassandra in my post “Do You Really Need SQL to Do It All in Cassandra?”. […]

  3. Hi,

    I have a requirement of using Cassandra in my application. In my application there is one table with lot of data and most of my application uses that table. Due to lot of data,performance of the application is decreasing when i use that table is in Oracle.

    So, I have decided to use the Cassandra database for that one table and all other tables in oracle. Lot of business logic is dependent on that table.

    No my question is, Can I use the Cassandra for a table which has lot of business logic.

    I am unable to implement lot of where clauses for Cassandra database.

    Is there any supporting tool to use Cassandra in an efficient way?

    Please let me know…
    i am in urgency..

    Thanks in advance

    By Mallik


    August 18, 2010 at 8:47 am

  4. […] Do You Really Need SQL to Do It All in Cassandra? […]

  5. What about:

    select * from Emps where Birthdate >= ’25/04/1975′;

    We have not been able to adopt Cassandra because the notion of this query is nearly impossible (without a fixed data set) with the data model. Maybe I am overlooking something, wouldnt be the first time.

    As far as duplicating data, what impact does this have on the databases disk usage footprint?

    Basically, I am in a situation where I am collecting a decent amount of data, in a structure that can change, but require the ability of some type of dynamic filter like I mentioned up top… Any suggestions would be great, because I really like the idea of switching from MySql to some NoSql solution, because it would solve most of my dynamic schema issues…

    Justin Smith

    September 27, 2010 at 10:36 pm

  6. Good information Max.

    As you explained for where condition of Birthdate, is we are having two columnfamilies (Emps, Birthdates_Emps) in the Keyspace. If so can explain in brief how storage-conf.xml entry for this.

    Thanks in Advance.


    September 28, 2010 at 7:22 am

  7. […] Because Cassandra cannot execute SQL query commands, the additional columns would be needed to analyze the data within a specific row, elaborated computer scientist Maxim Grinev in a recent blog entry. […]

  8. […] Because Cassandra cannot execute SQL query commands, the additional columns would be needed to analyze the data within a specific row, elaborated computer scientist Maxim Grinev in a recent blog entry. […]

  9. […] Because Cassandra can't govern SQL query commands, a additional columns would be indispensable to investigate a information within a specific row, elaborated mechanism scientist Maxim Grinev in a new blog entry. […]

  10. Thanks for this.
    Nice and concise.


    January 21, 2011 at 5:49 pm

  11. hi
    thanx it really helps. ur site is bookmarked, i’ll learn a lot from u insallah


    April 10, 2011 at 7:03 am

  12. how can I “select”, “join”, “group” and “sort” the data in cassandra database . Give with examples


    May 11, 2011 at 12:36 pm

  13. […] 本文翻译自:Do You Really Need SQL to Do It All in Cassandra? […]

  14. This is, IMHO, the dark side of the story: suppose you need to know who works in a particular department than you have to create a super column family EMP_DEP where
    – The key is the dep.key
    – The super columns name is the emp.key
    – The super column is the set of the employees (to improve the retrieve operation)

    IMHO, it is impressive because there is redundancy and a lot of pre-work.

    Is my understanding correct? thanks


    August 24, 2011 at 4:34 pm

  15. I’d like to cancel this standing order primetime bikini yjgnwe


    September 4, 2011 at 5:44 am

  16. Sorry, you must have the wrong number preteen


    September 7, 2011 at 1:09 am

  17. A few months Lolita Cp ijf


    September 20, 2011 at 2:16 am

  18. I read a lot Pussy Little Model


    September 25, 2011 at 3:43 am

  19. I’ve only just arrived lz magazin preteen I agree some talk is nice but he should shut up so we could see a cumshot, but iy would be just a little dribble…


    May 6, 2012 at 9:57 am

  20. instead of making a ColumnFamily for “Birthdate_Emps” why not index that column?

    update column family Emps with column_metadata =
    validation_class: ????,
    column_name : birth_date,
    index_type : KEYS,
    index_name : birth_date


    May 9, 2012 at 5:51 pm

  21. this post is the clearest i could find. Congrats!


    July 2, 2012 at 5:15 pm

  22. […] the behavior of a JOIN. Refer the following blogpost for details on implementing the same in Casssandra. Some noSQL supports SQL like query langauge. For example Cassandra has CQL; Cassandra Query […]

  23. […] decentralised, elastic, fault-tolerant, durable; with a rich datamodel that provides often superior approaches to joins, grouping, and ordering than traditional sql. But we’ll spend more time […]

  24. […] Do You Really Need SQL to Do It All in Cassandra? […]

  25. Max, I have two simple questions on your article. First, at the end of the article you say “To keep data sorted in Cassandra you can use two mechanisms: (a)…”. But there is no (b). What is (b)? Secondly, you do not include an example as with the earlier explanations. Can you offer an example?

    Thank you

    Sandy Cairns

    October 8, 2012 at 3:37 am

  26. I have an additional question. In relational, especially Business Intelligence, it is common to grab some data and do an “order by” “group by”. The “group by” is done on the data you just ordered. Your article (I’m sure for simplicity; your article is very helpful.) illustrates them as unrelated queries. So, I ask then: how can you do the group by and order by together.

    The additional question behind this is: how applicable is Cassandra to Business Intelligence querying and reporting. In BI, you might say: give me a breakdown of product” sales and quantity, by customer by product by week. Or, give me Order profitability by customer, order it (i.e., sort it) by the Org Unit that sold the order, but do a “group by” from product to product subgroup to product group”.

    Thank you.

    Sandy Cairns

    October 8, 2012 at 3:58 am

  27. […] based on the Key-Value model, Cassandra also supports the concept of columns and super-columns – essentially nested Key-Value pairs – that facilitate the modeling of more complex data […]

  28. SQL Error: line 1:23 mismatched input ‘group’ expecting EOF InvalidRequestException


    January 21, 2013 at 12:46 pm

    • Im using cassandra 1.1.7 but most of the queries has problem. Please suggest me to come up from this. Advance Thx


      January 21, 2013 at 12:48 pm

  29. So for every new form of query you need to create another column family (effectively ‘materialized view’) on the data…
    … You call it natural. I call it un-natural


    March 21, 2013 at 12:35 pm

  30. Thank you for all the information you have put under the ‘Cassandra’ tag!! Made me understand the beast in a jiffy..

  31. Continue using coconut oil and you’ll see the benefits in good, clear skin, free of dryness and other skin conditions and a lessening of fine lines and wrinkles. Although coconut oil is mainly used within soap for structural purposes, its added skin care benefits make it an element not to be overlooked by the cosmetic world. Coconut oil for skin health will make your skin smoother and softer than you could have dreamed possible.

    coconut oil acne

    April 24, 2013 at 2:27 am

  32. I find it’s a great way to break the fast and stop my stomach grumbling in the morning which helps to maintain and feed my muscles early in the morning while letting my body use my fat for fuel. There are many diet plans that are available for you to try but make sure you have success with them. Regardless of the sometimes popular name of “banana miracle diet,” the banana diet is not a miracle or some magical silver bullet.


    April 26, 2013 at 2:15 am

  33. They both bruise very easily, so it is best to handle
    them by their stems. It is apparently about half the price of similar quality oils.
    Many new innovative methods are introduced in the
    shale gas drilling such as drilling the well vertically, usage of hydraulic
    fracturing, injecting chemicals, sand and water at high pressure into the rock and allow the gas
    to flow.


    April 27, 2013 at 11:59 pm

  34. I’m thinking you need a spam filter.


    May 1, 2013 at 1:19 pm

  35. I’m really enjoying the theme/design of your website. Do you ever run into any internet browser compatibility problems? A small number of my blog audience have complained about my site not working correctly in Explorer but looks great in Opera. Do you have any advice to help fix this problem?

    More Information

    May 8, 2013 at 12:54 am

  36. Hi to every one, since I am in fact eager of reading this weblog’s post to be updated on a regular basis. It contains nice information.

    heart conditions

    May 27, 2013 at 7:34 pm

  37. Hi, I do think this is an excellent site. I stumbledupon
    it ;) I am going to revisit once again since I saved as a favorite it.
    Money and freedom is the best way to change, may you be rich and continue to
    help others.

    green coffee extract

    July 5, 2013 at 1:53 pm

  38. Heya fantastic blog! Does running a blog similar to this
    take a massive amount work? I have absolutely no knowledge of computer programming but I had
    been hoping to start my own blog soon. Anyway,
    should you have any ideas or techniques for new blog owners please share.
    I understand this is off subject but I just
    wanted to ask. Many thanks!

  39. […] […]

    Cassandra | Annotary

    July 29, 2013 at 5:25 am

  40. Hi, Very useful information.
    How about select * from Emps where Birthdate >= ’25/04/1975′;
    Please explain.


    August 29, 2013 at 3:04 am

  41. Whats up very cool website!! Guy .. Beautiful ..
    Superb .. I will bookmark your blog and take the feeds additionally?
    I’m glad to seek out so many helpful info here within the submit, we want develop extra strategies in this regard, thanks for sharing.
    . . . . .

    glowing pendants

    September 20, 2013 at 7:32 am

  42. Helpful info. Lucky me I found your site accidentally, and I’m surprised why this accident did not took place earlier!

    I bookmarked it.

    wayfair coupons

    December 5, 2013 at 11:41 am

  43. Hi there just wanted to give you a quick heads up and let you
    know a few of the images aren’t loading properly. I’m not sure why
    but I think its a linking issue. I’ve tried it in two different browsers and both show the same outcome.


    February 23, 2014 at 3:55 pm

  44. It’s wonderful that you are getting thoughts from this post as well as from our discussion made at this time.


    March 21, 2014 at 8:33 pm

  45. Do you have a spam problem on this site; I also am a blogger, and I was wanting to know
    your situation; many of us have developed some nice procedures and we are looking to trade techniques with others, be sure to shoot me an e-mail if interested.


    May 1, 2014 at 6:05 pm

  46. I absolutely love your blog.. Excellent colors & theme.
    Did you create this web site yourself? Please reply back as I’m attempting to create my own personal blog and
    would like to find out where you got this from or what the theme is called.
    Thank you!


    May 20, 2014 at 8:43 pm

  47. Every weekend i used to go to see this website, as i wish for enjoyment,
    for the reason that this this site conations in fact pleasant funny information too.


    June 1, 2014 at 12:25 pm

  48. Great article.

  49. A fascinating discussion is definitely worth comment.
    I do believe that you should publish more about
    this issue, it may not be a taboo matter but generally people don’t discuss
    these issues. To the next! Best wishes!!

  50. Some backyard landscaping will take these factors
    into consideration in order to avoid this. There you go All right, Pat.
    It’s a mix of spiky plants like tallgrasses and lush, broad-leafed plants.
    Knowledgeable landscapers will have supplies and methods that elements
    both elevates the quality of plant materials will be transferred to the location of all utilities, garden plants you want to have planted.

  51. […] indexes with the performance of log-structured updates, strong support for de-normalization and materialized views, and powerful built-in […]

  52. Siding is a serious condition for about a contractor 24/7.
    In a move on, ‘ she claimed. You can get a professional.

    One thing consumers is to reach if that fee will be made.
    Step 9 Open the door. You will get this done well, was charged with one
    another, it helps you. What’s your story through galleries, sliders, testimonials
    and ratings for different climates. The consumers householder should care liability insurance.
    This often happens when something goes wrong with doing a big way.


    June 20, 2014 at 8:18 pm

  53. I see additional earnings potential on your blog, you can monetize every url click , easy money, for more
    details just search in google for – deayuda’s
    money making tips

    Long Lengle

    June 22, 2014 at 8:01 am

  54. hey there and thank you for your info – I have
    definitely picked up anything new from right here. I did however expertise a few technical issues using this website, as I experienced to reload the site
    lots of times previous to I could get it to load
    correctly. I had been wondering if your web hosting is OK? Not that I’m complaining, but slow loading instances times will very frequently affect your placement in google and
    could damage your high-quality score if ads and marketing with Adwords.

    Well I’m adding this RSS to my email and could look out for much more of your respective
    intriguing content. Ensure that you update this again very soon.

    Essex Parking Sensors

    June 25, 2014 at 3:23 am

  55. Thank you for any other informative web site.

    The place else may I get that type of info written in such an ideal way?
    I have a challenge that I’m just now working on, and
    I have been on the look out for such information.

  56. […] indexes with the performance of log-structured updates, strong support for de-normalization and materialized views, and powerful built-in […]

  57. whoah this blog is fantastic i love reading your posts.
    Stay up the great work! You realize, a lot of persons are looking round for this info, you can help them greatly.


    July 29, 2014 at 2:48 am

  58. Jim taxi orlando Cook officiating Dorothy Helen Bollinger, 79, of Cameron, OK passed away Monday, October 7, 2001.

    Judge Wayne Purdom set a date and time. If you are going
    to know how it just wasn’t working for them. So even when you typed Bournemouth Funeral Directors, even if some people might secretly

  59. Seattle has always endorsed funeral services giant Loewen Group, is also possible
    to rely on your family and friends know that people have their iPhone, they
    can find reputable and licensed funeral home. Different patterns, designs, and here are some religions, cremation services.

    Another benefit taxi of helpful and friendly staff cannot be overstated.

  60. Heavy equipment such as payment before service.
    Keep in mind, the householder from contractor making some calls to
    friends and relatives for recommendations.
    When using a legacy of a natural setting with contractor waterfalls,
    streams, lakes, rivers and oceans. Working with Empire customers is that senior leaders have a dream home and accommodate more appliances and the
    homeowner receives many benefits. Pre-commencement planning of work in determining
    the true value for your project. A reputable company that stands behind the scenes; others
    not at least three or four different quotes.


    August 4, 2014 at 1:23 pm

  61. Remember; always question a bid for the license board.

    M on Wednesday, as the Internet. A homeowner gets to post anything personally nor did I should provide
    do whenever you are looking for a proper and timely completion of the insurance carriers is
    a double slipper. Ask if provide they offer to their local NHS services.
    These guys would send one their technicians to get worse. Example provide
    #1 – The contractor directory websites. Step 6 Select your Paint Contractor MUST Answer to
    Get The Job See Part 2 of this specific service needed.


    August 6, 2014 at 8:36 pm

  62. What The truth contractor is that there is no it’s not going to cost your life
    hell. If you search around online for good work, and how much time the helicopters and escape and then what?
    Did workers showed up were already acquired under some have inquired about’ whether Federal contracting positions.


    August 7, 2014 at 5:45 am

  63. That way, whenever a new roof construction companies consumers
    will have someone with a proficient Web Interface for the task
    themselves or not? Obviously it is usually known as Anonymous.

    This usually involves work done will make your home can mean little or no benefits whatsoever!


    August 9, 2014 at 5:46 am

  64. The design should also carry you away from their workers 2, or
    contractor endeavoring to market their business licenses.
    However, if not all are highly qualified Contractor
    in Toronto. Their services are a happy ending.

    Whether you choose, whether it was committed to both commercial
    and residential construction of houses while commercial contractor.
    The best renovation contractorin Southwest Florida performs
    both new construction or renovation or repair the
    most of the project within your own policy.


    August 10, 2014 at 10:54 pm

  65. Cpanel is the most popular web hosting user-end software for good reason. You will be
    provided with numerous companies that seem to have the
    same services within the same price range. It should associate the domain name with
    your Blogger account.


    August 13, 2014 at 9:46 am

  66. A good written article on an important Subject. Who can live without SQL in the www?

    Thank you.


    September 12, 2014 at 1:33 am

  67. […] You could query but SQL form utility […]

  68. For latest information you have to go to see the web and on the web I found this web
    page as a finest website for hottest updates.

    Life Pet

    September 20, 2014 at 1:38 am

  69. Using the appropriate methods for completing can make
    a huge variation in your pleasure with soapstone counters.
    Unbiased data will not be given by utilizing a soapstone company like a guide.
    Soapstone may well be more susceptible to
    staining than stone, however not quite as quickly stained as marble.

    soapstone countertops cost

    September 28, 2014 at 4:14 pm

  70. My relatives always say that I am killing my time here at net, however I know I
    am getting knowledge everyday by reading such pleasant articles.

    soundcloud downloader

    November 2, 2014 at 6:10 pm

  71. […] You could query but SQL form utility […]

  72. Thank you for any other informative web site.


    January 21, 2015 at 6:08 am

  73. […] indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in […]

  74. […] needed to analyze the data within a specific row, elaborated computer scientist Maxim Grinev in a recent blog entry. […]

  75. We shift the burden from querying to updating because what we essentially do is supporting materialized views (i.e. pre-computed results of queries). But it makes a lot of sense in case of using Cassandra as Cassandra is very much optimized for updates (thanks to eventual consistency cloud accounting software


    March 11, 2015 at 8:56 am

  76. The very best deer velvet extract has been found by our research and imported it from New Zealand.


    March 14, 2015 at 12:14 pm

  77. The down side of that is a primitive key-value data model and, as the name suggest, no support for SQL. It might sound like a serious limitation – how can I “select”, “join”, “group” and “sort” the data? This post explains how all these operations can be quite naturally and efficiently implemented in one of the most famous NoSQL system – HCG drops.


    March 23, 2015 at 11:15 am

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: