Extending Cassandra with Asynchronous Triggers
[This post is written by Maxim Grinev and Martin Hentschel on the work done at Systems Group @ ETH Zurich]
Motivation
Latency matters! Amazon found every 100ms of latency cost them 1% in sales. Google found an extra .5 seconds in search page generation time dropped traffic by 20%. Even the smallest delay kills user satisfaction. You can find more on the importance of latency here and here.
Underlying database significantly contribute to the overall application response time, especially on update operations as they cannot be sped up via caching. A common approach to reduce the database-imposed latency is to break a request into sub-operations which are executed asynchronously so that the application will acknowledge the request without waiting to ensure that all the sub-operations are executed. This approach allows for tremendous latency reduction and also makes the latency guaranteed even when sub-operation execution time varies significantly. The implication of asynchronous execution is eventual consistency of data. But eventually consistency is already a common practice that meet requirements of many Web applications.
Asynchronous execution is typically implemented using queues in front of a database. Yet, integrating asynchronous execution into a database provides a number of benefits:
- Much easier programming model and simplified system management – the developer/administrator has to deal with one system (i.e. database) instead of two (i.e. queue and database)
- Asynchronous execution can be used to automate internal database operations with reduced latency – for example, updating indexes.
We have extended Cassandra database system with support for Asynchronous Triggers (CASSANDRA-1311). Asynchronous triggers is a basic mechanism that can be used to implement various use cases of asynchronous execution of application code at database side. Cassandra and Async triggers are a perfect match as they both exploit advantages out of eventual data consistency.
Cassandra Async Triggers
Like traditional database triggers, Cassandra Async trigger is a procedure that is automatically executed by the database in response to certain events on a particular database object (e.g. table or view). The distinguishing feature of Async trigger is that the database responds to the client on successful update execution without waiting for triggers to be executed, thus reducing response latency.
More precisely, Cassandra Async triggers can be described as follows:
- A trigger is set on a column family and is executed in case of any update to the column family. Cassandra triggers are “after” triggers. A trigger is executed after the update operation that fires the trigger and can see the results of the update.
- Trigger procedures are implemented in Java. The application developer implements the
execute
method ofITrigger
interface. - Cassandra Async triggers are mutation-level triggers. A trigger is executed for each mutation issued to the column family. For example, if a trigger is fired by
batch_mutate
it will be executed for each mutation separately. Each trigger is parametrized by the mutation that fires it. - In contrast to traditional triggers, which are synchronous, Cassandra Async triggers are asynchronous. The database acknowledges update execution to the client after the update is executed and the fired triggers are submitted for execution. Actual execution of fired triggers happens after the acknowledgement to the client. It allows saving latency but leads to eventual consistency of data.
- Our implementation guarantees triggers to be executed at least once. The system is responsible for handling failures during trigger execution so that the client does need to care about it. It makes client-side code easier to develop and maintain. Note that triggers may be executed more than once that requires triggers to be idempotent. This is a requirement to any Cassandra update though. We chose not to implement exactly-once semantics because it introduces essential overhead.
Applications
Async triggers is a powerful mechanism that allows for many interesting applications:
1) Index and Materialized View Support
Cassandra does not support secondary indexes. To index data other than by record key or column name, the application developer needs to store data redundantly in a separate column family. In this case triggers can be used to update secondary indexes whenever records in the original column family are modified.
Similar to indexes, materialized views store data redundantly. So they also can be supported via triggers. Find general guidelines on managing redundant data (indexes and materialized views) in Cassandra in my post “Do You Really Need SQL to Do It All in Cassandra?”.
Implementing index and view maintenance synchronously means coupling update and maintenance operations together. This will increase response time of each update operation depending on the number of indexes. Using an asynchronous mechanism will keep the response time constant.
2) Online Analytics
Async triggers can be effectively used to propagate changes (in near real-time) from the “operational” part of the database to the “analytical” part without reducing response time for operational updates. It opens up a way for many new interesting applications! Note that delay in the data propagation caused by asynchronous triggers is negligible to the majority of analytical applications. It is online (near real-time) analytics in comparison, for example, with offline MapReduce analytics.
3) Specific application: Data propagation in social networks
Data propagation in social networks (e.g. sending user’s posts to all her friends) are proven to be only scalable with push-on-change model (i.e. redundant data propagation via updates with simple queries) in comparison with pull-on-demand model (i.e. combining data from normalized storage on demand via queries). Twitter uses the push-on-change model: it employs queues to implement asynchronous data propagation. Using just a few triggers one can easily implement production version of a Twitter-like network.
Asynchronous triggers is submitted as a contribution to Cassandra. You can find it at CASSANDRA-1311
[…] Cassandra has (Async) Triggers: Similar to Riak post-commit hooks:. . Like traditional database triggers, Cassandra Async trigger […]
Cassandra has (Async) Triggers. › PHP App Engine
August 5, 2010 at 6:52 pm
great,this is very informative.
sree
March 2, 2011 at 12:53 pm
Well Cassandra now does support secondary indexes, but your point is still valid, namely that triggers are a great way to update any denormalised data that you might have (for whatever reason).
Andrew Swan (@AndrewSwan_au)
July 6, 2012 at 12:00 am
[…] to a database event sounds fantastic. I’m not the only one apparently as there are other groups who have worked on implementing this with a C* back […]
PostgreSQL Day 2 (Seven Databases in Seven Weeks) « Memory Leak
January 26, 2013 at 11:47 pm
[…] Tout comme les SGBD classiques, la communauté de Cassandra a voté pour disposer de cette fonctionnalité (voir le JIRA #CASSANDRA-1311). Un article intéressant en anglais explique le pourquoi et les cas d’utilisations des triggers dans Cassandra : https://maxgrinev.com/2010/07/23/extending-cassandra-with-asynchronous-triggers/. […]
Retours sur le Cassandra breakfast | Blog Xebia France
July 18, 2013 at 1:29 am
[…] https://maxgrinev.com/2010/07/23/extending-cassandra-with-asynchronous-triggers/ […]
Streaming from Cassandra with Triggers
September 4, 2013 at 9:44 pm
This site was… how do you say it? Relevant!! Finally I’ve found something which
helped me. Thank you!
how to get bigger boobs fast
November 5, 2013 at 10:48 am
Maxim Grinev, i am moved with your describing power. You’re good man.
Shabbir
November 19, 2013 at 12:18 pm
Very interesting things when it comes to web site analytics. I will have to checkout all of my sites with this information and make sure all is optimized.
George Mason
January 7, 2014 at 8:42 am
This is so interesting, damn I wish I was better at computer stuff, I studied chemical engineering but perhaps it should have been software engineering., I guess there’s always time
Become
January 8, 2014 at 9:36 am
I am not technical but this is really interesting.
Coco
January 26, 2014 at 1:39 pm
I really don’t give much thought to latency — I mean a mini-sec here and there.. who cares. But when you’re serving millions it matters.
As a web-design firm we haven’t built massive sites that need that perfomance (yet!) but I’ll keep this in mind if we ever do.
Amul
February 7, 2014 at 11:44 am
It’s really nice. Very Informative
Daniel Laptova
March 9, 2014 at 2:33 am
Great information. Nice to share with our.
Foam Mattress Best
May 21, 2014 at 11:02 pm
Latency is one of my pet peeves (this may show my impatient nature!), but I completely agree that sites with poor performance will see less sales. As humans, we are all in a rush and want everything done yesterday, so if we have to wait on 1 site and not another, its not rocket science on which one we are going to pick!
Alice
August 6, 2014 at 2:31 am
I’ve been battling latency on my site as well, I’ve switched themes to try some testing. Not sure if it will work but I’m going to give it a shot.
Matt
August 21, 2014 at 11:10 am
What’s up, I wuld like to subscribe for thjs website to obtain most recent updates,
sso where can i do it please help.
Employee engagement
September 26, 2014 at 1:02 am
This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!.
агентства недвижимости в Монако
Sohail Shaikh
May 16, 2015 at 7:17 am
I was looking for a helpful information relating to same topic essaymania site. we discover your post a good facilitate for the assignment of essay writing facilitate. this is often superb and it’s given terribly finely. The presentation methodology catches the eye of the guests and that they feel superb by obtaining facilitate from your post.
larsenmark (@LsLarsenmark)
June 22, 2015 at 12:15 am
Hi there to every , because I am actually keen of reading this webpage’s post
to be updated on a regular basis. It includes pleasant stuff.
write my paper for me
July 6, 2015 at 4:52 am