Max's Output

Extending Cassandra with Asynchronous Triggers

with 13 comments

[This post is written by Maxim Grinev and Martin Hentschel on the work done at Systems Group @ ETH Zurich]

Motivation

Latency matters! Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. Even the smallest delay kills user satisfaction. You can find more on the importance of latency here and here.

Underlying database significantly contribute to the overall application response time, especially on update operations as they cannot be sped up via caching.  A common approach to reduce the database-imposed latency is to break a request into sub-operations which are executed asynchronously so that the application will acknowledge the request  without waiting to ensure that all the sub-operations are executed. This approach allows for tremendous latency reduction and also makes the latency guaranteed even when sub-operation execution time varies significantly. The implication of asynchronous execution is eventual consistency of data. But eventually consistency is already a common practice that meet requirements of many Web applications.

Asynchronous execution is typically implemented using queues in front of a database. Yet, integrating asynchronous execution into a database provides a number of benefits:

  1. Much easier programming model and simplified system management – the developer/administrator has to deal with one system (i.e. database) instead of two (i.e. queue and database)
  2. Asynchronous execution can be used to automate internal database operations with reduced latency – for example, updating indexes.

We have extended Cassandra database system with support for Asynchronous Triggers (CASSANDRA-1311). Asynchronous triggers is a basic mechanism that can be used to implement various use cases of asynchronous execution of application code at database side. Cassandra and Async triggers are a perfect match as they both exploit advantages out of eventual data consistency.

Cassandra Async Triggers

Like traditional database triggers, Cassandra Async trigger is a procedure that is automatically executed by the database in response to certain events on a particular database object (e.g. table or view). The distinguishing feature of Async trigger is that the database responds to the client on successful update execution without waiting for triggers to be executed, thus reducing response latency.

More precisely, Cassandra Async triggers can be described as follows:

  • A trigger is set on a column family and is executed in case of any update to the column family. Cassandra triggers are “after” triggers. A trigger is executed after the update operation that fires the trigger and can see the results of the update.
  • Trigger procedures are implemented in Java. The application developer implements the execute method of ITrigger interface.
  • Cassandra Async triggers are mutation-level triggers. A trigger is executed for each mutation issued to the column family. For example, if a trigger is fired by batch_mutate it will be executed for each mutation separately. Each trigger is parametrized by the mutation that fires it.
  • In contrast to traditional triggers, which are synchronous, Cassandra Async triggers are asynchronous. The database acknowledges update execution to the client after the update is executed and the fired triggers are submitted for execution. Actual execution of fired triggers happens after the acknowledgement to the client. It allows saving latency but leads to eventual consistency of data.
  • Our implementation guarantees triggers to be executed at least once. The system is responsible for handling failures during trigger execution so that the client does need to care about it.  It makes client-side code easier to develop and maintain. Note that triggers may be executed more than once that requires triggers to be idempotent. This is a requirement to any Cassandra update though. We chose not to implement exactly-once semantics because it introduces essential overhead.

Applications

Async triggers is a powerful mechanism that allows for many interesting applications:

1) Index and Materialized View Support
Cassandra does not support secondary indexes. To index data other than by record key or column name, the application developer needs to store data redundantly in a separate column family. In this case triggers can be used to update secondary indexes whenever records in the original column family are modified.

Similar to indexes, materialized views store data redundantly. So they also can be supported via triggers. Find general guidelines on managing redundant data (indexes and materialized views) in Cassandra in my post “Do You Really Need SQL to Do It All in Cassandra?”.

Implementing index and view maintenance synchronously means coupling update and maintenance operations together. This will increase response time of each update operation depending on the number of indexes. Using an asynchronous mechanism will keep the response time constant.

2) Online Analytics
Async triggers can be effectively used to propagate changes (in near real-time) from the “operational” part of the database to the “analytical” part without reducing response time for operational updates. It opens up a way for many new interesting applications! Note that delay in the data propagation caused by asynchronous triggers is negligible to the majority of analytical applications. It is online (near real-time) analytics in comparison, for example, with offline MapReduce analytics.

3) Specific application: Data propagation in social networks
Data propagation in social networks (e.g. sending user’s posts to all her friends) are proven to be only scalable with push-on-change model (i.e. redundant data propagation via updates with simple queries) in comparison with pull-on-demand model (i.e. combining data from normalized storage on demand via queries). Twitter uses the push-on-change model: it employs queues to implement asynchronous data propagation. Using just a few triggers one can easily implement production version of a Twitter-like network.

Asynchronous triggers is submitted as a contribution to Cassandra. You can find it at CASSANDRA-1311

About these ads

Written by maxgrinev

July 23, 2010 at 3:22 pm

Posted in Cassandra

13 Responses

Subscribe to comments with RSS.

  1. [...] Cassandra has (Async) Triggers: Similar to Riak post-commit hooks:. . Like traditional database triggers, Cassandra Async trigger [...]

  2. great,this is very informative.

    sree

    March 2, 2011 at 12:53 pm

  3. Well Cassandra now does support secondary indexes, but your point is still valid, namely that triggers are a great way to update any denormalised data that you might have (for whatever reason).

  4. [...] to a database event sounds fantastic. I’m not the only one apparently as there are other groups who have worked on implementing this with a C* back [...]

  5. […] Tout comme les SGBD classiques, la communauté de Cassandra a voté pour disposer de cette fonctionnalité (voir le JIRA #CASSANDRA-1311). Un article intéressant en anglais explique le pourquoi et les cas d’utilisations des triggers dans Cassandra : http://maxgrinev.com/2010/07/23/extending-cassandra-with-asynchronous-triggers/. […]

  6. This site was… how do you say it? Relevant!! Finally I’ve found something which
    helped me. Thank you!

  7. Maxim Grinev, i am moved with your describing power. You’re good man.

    Shabbir

    November 19, 2013 at 12:18 pm

  8. Very interesting things when it comes to web site analytics. I will have to checkout all of my sites with this information and make sure all is optimized.

    George Mason

    January 7, 2014 at 8:42 am

  9. This is so interesting, damn I wish I was better at computer stuff, I studied chemical engineering but perhaps it should have been software engineering., I guess there’s always time

    Become

    January 8, 2014 at 9:36 am

  10. I am not technical but this is really interesting.

    Coco

    January 26, 2014 at 1:39 pm

  11. I really don’t give much thought to latency — I mean a mini-sec here and there.. who cares. But when you’re serving millions it matters.

    As a web-design firm we haven’t built massive sites that need that perfomance (yet!) but I’ll keep this in mind if we ever do.

    Amul

    February 7, 2014 at 11:44 am

  12. It’s really nice. Very Informative

    Daniel Laptova

    March 9, 2014 at 2:33 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: