Update Idempotency: Why It is Important in Cassandra Applications
When you develop application for Cassandra you should be aware of the following fact. Even when client observes the failure of an update it is still possible that this update has been executed successfully. The cause of such anomalous behavior is that Cassandra does not support transactional rollback. But please do not rush with judgements. There is a reason for that. Cassandra is a distributed database so rollback requires support for distributed transactions that have a performance cost and do not scale well. So if you want it fast and scalable you have to handle this anomaly.
Before we discuss a solution let’s look at examples when the anomaly might happen:
- In Cassandra data are (usually) replicated. You can specify how many replicas must be successfully updated to consider the whole update to be successful (it allows making update writable even in case when some of replica nodes are down or not accessible due to network partitioning). Still if less nodes confirm updates than required Cassandra will return an error to the application saying the write failed. But it will not cleanup/rollback updates from those nodes where they were successfully executed. For example, if the replication factor is 3, the required number of nodes to be updated 2, and only 1 node was actually updated, the application will receive an error. Despite of the error subsequent reads will see the update. Moreover, the mechanism of eventual update propagation (i.e. read repair that is triggered on the first subsequent read) will update other replicas with a value that was actually deemed a failed write to the client. Note that it should not happen often as Cassandra does go out of its way (via gossip-based node failure detection) to make sure the cluster is healthy enough to executed the update. But it is possible.
- The whole update can be successfully executed but the return message is lost.
The idea is that you retry the failed update until it is successful. As a result, the same update can be executed several times! If the update increments a counter the counter value gets incorrect. Here we come to the main point of this post: all your updates should be idempotent (i.e. repeated update applications have the same effect as one). Designing updates to be idempotent is the standard discipline to cope with repeated updates. Read great articles Life beyond Distributed Transactions: an Apostate’s Opinion and Building on Quicksand by Pat Helland that stress the importance of idempotent updates in highly scalable systems.
To ensure idempotence (i.e. guarantee the processing of retried updates is harmless) the database should be designed to remember that the update has been processed. Typically it can be achieved by storing identifiers which can be used to uniquely identify an update. For example, suppose we want to count how many times each URL has been posted on Twitter. Instead of just storing the mapping of URLs to counters (i.e. column family
URL_statistics where each record has an URL as a key and a single column having counter as its value) a solution can be to store the mapping of each URL to the IDs of the tweets which contains the URL (i.e. column family
URL_Tweets where each record has an URL as a key, columns representing tweets, column names are the tweet IDs, and column values are not used). URL counters will then be computed on retrieval by counting tweet IDs.
It is a good idea to store tweet IDs as column names so that Cassandra automatically eliminates duplicates – repeated update will be ignored in this case (use this great Cassandra feature to make your updates idempotent!). You can easily come up with other options to design the database but this way or another you have to store some information to ensure update idempotence.
It is common that Cassandra applications are not initially designed for idempotence. At first, small scale deployments do not exhibit these subtle problems and work fine. Only as time passes and their deployments expand the problems manifest and the applications respond to handle them. Do it right from the beginning.