Managing Indexes in Cassandra using Async Triggers
[This post is written by Maxim Grinev and Martin Hentschel]
Suppose you are building a Cassandra application and you want to speed up your queries via indexing. Cassandra does not support secondary indexes at first, but storing redundant data (in a different layout) will give you the same effect. The main drawback is that your application (the code that writes to the DB) needs to take care of managing the index. Every time you write to the DB, you also need to maintain your index. This will notably slow down the response time of any user of your web application.
In Figure 1 whenever a user request forces a write to the database (Step 1), the application also updates the index. At least two operations to the database layer are needed, the response time of this request increases. The advantage of this approach is that your index is always in sync with the data. Any query to the index will see the latest result (Step 2).
Figure 1: Application inserts data and builds index in one step.
We recently proposed an extension to Cassandra, which we call Async triggers. An Async trigger will listen on a column family inside Cassandra. Whenever a modification to this column family is made, the trigger will be scheduled for an asynchronous execution. In our case, the logic to build and maintain the index shifts from the application to the trigger. This means the application has less work to do and can return faster. The response time of a user request will be reduced.
In Figure 2 a write to the database (Step 1) will return as soon as the write is finished. An Asynch trigger is scheduled to update the index. This trigger will run some time after the response to the client. This also means that for a short period of time, the data and the index will be out of sync (i.e. inconsistent). A query to the index (Step 2) might now always see the latest results. We believe that this is acceptable for many web applications. For example in Twitter it is totally fine if a search ignores tweets that have been posted less than a second ago.
Figure 2: An Asynch trigger maintains the index.
Of course you may get similar good response times using other architectures. For example if you explicitly use queues to separate writes to the database and maintenance of indexes. We are currently doing research on advantages and disadvantages of Async triggers with respect to such architectures.
Example: Index users by name
Here is a concrete example of how to index a user database not only by user id, but also by name (a secondary index). Figure 3 shows a possible layout of column families in Cassandra. The first column family “Users” stores data about users. Each row is naturally indexed by its row key (in our case it is the user id). The second column family “Index” stores redundant data to quickly retrieve users by their name. For example if you want to look up the name “Sue”, you will find two users with ids 2 and 4.
Figure 3: Database layout storing users by id and name.
Using an Async trigger to update the index whenever there is a write to the users column family works involves two steps. First, we need to implement the trigger and second, we need to specify the column family that the trigger will listen on.
To implement a trigger we will implement the execute method of the ITrigger
interface. First, we connect to the local Cassandra instance. The trigger will execute within Cassadra, no network overhead is involved. Then we will get the user name of the user that has just been inserted (or modified). The user id is provided by the key parameter. We can insert this user id into the respective user name row. (Note that I have removed any exception handling or null checks for ease of reading.)
public class UpdateIndex implements ITrigger { public void execute(byte[] key, ColumnFamily cf) { // connect to local Cassandra instance CassandraServer client = new CassandraServer(); client.set_keyspace("TriggerExample"); // get user name byte[] userName = cf.getColumn("name".getBytes()).value(); // insert the user id into the index ColumnParent parent = new ColumnParent("Index"); byte[] userId = key; long timestamp = System.currentTimeMillis(); Column indexedValueColumn = new Column(userId, "1".getBytes(), new Clock(timestamp)); client.insert(userName, parent, indexedValueColumn, ConsistencyLevel.ONE); } }
It remains to specify that this trigger should listen on the “Users” column family. The following entry needs to be added to the cassandra.yaml
file.
triggers: - name: UpdateIndex keyspace: TriggerExample column_family: Users implementation: UpdateIndex
That’s it. The complete example source code along with appropriate scripts to run our example can be found in the directory contrib/trigger_example
.
Currently our extension to Cassandra is under submission. In order to try Async triggers and this example, find the patch here: https://issues.apache.org/jira/browse/CASSANDRA-1311
Absolutely great!
Augi
August 5, 2010 at 5:29 pm
Very helpful.
Frank LoVecchio
September 13, 2010 at 4:35 pm
Glad I\’ve finally found smoetihng I agree with!
Lovie
December 9, 2011 at 7:48 pm
lp0VyG stzfmbuqqwpp
milfbbigpi
December 11, 2011 at 6:02 pm
Very interesting – I wonder about the locality of the execution of the trigger – will it run on a host that has the modified data? Does it run in the same JVM?
Henrik Lindberg
July 12, 2011 at 3:57 pm
Exxcellent way of telling, and pleasant article to obtain facts
concerning my presentation subject, which i am going tto convey in academy.
Max Detox Blend
September 14, 2013 at 5:23 am
I’ve been surfing online more than three hours today, yet I never found any interesting article like yours. It’s pretty worth enough for me. In my opinion, if all webmasters and bloggers made good content as you did, the web will be a lot more useful than ever before.
Недвижимость в Монако
Sohail Shaikh
May 16, 2015 at 9:04 am
I am trying to implement a trigger that populate a View based on another input but
CassandraServer cassandraServer = new CassandraServer();
cassandraServer.set_keyspace(“recommendation_engine”);
throws an exception
java.lang.AssertionError: null
at org.apache.cassandra.thrift.ThriftSessionManager.currentSession(ThriftSessionManager.java:55) ~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.thrift.CassandraServer.state(CassandraServer.java:103) ~[apache-cassandra-2.1.7.jar:2.1.7]
at org.apache.cassandra.thrift.CassandraServer.set_keyspace(CassandraServer.java:1750) ~[apache-cassandra-2.1.7.jar:2.1.7]
at TestTrigger.augment(TestTrigger.java:32) ~[na:na]
the connection is null can you someone provide some information ?!
adelinghanayem
July 22, 2015 at 4:27 am