Max's Output

Archive for January 2010

Want to have a scalable and high-available storage? Be ready to apologize to your customers

leave a comment »

After having read the paper on Amazon Dynamo I have been confused. It is clear that when you move from strictly consistent to eventually consistent storage every bad anomaly becomes possible: stale data reads, reappearing of deleted items, etc. Shifting the task of supporting consistency from storage to applications helps sometimes but does not fully eliminate the possibility of the anomalies. Business decisions made in the presence of such anomalies can lead to serious flaws such as overbooking airplane seats, clearing bounced checks, etc. It might sound completely unacceptable, but don’t jump to conclusions.

In the paper “Building on Quicksand” Pat Helland and Dave Campbell consider such anomalies and subsequent business decisions as part of common business practices – every business should be ready to apologize. To justify the statement, the authors provide a number of striking parallels from the pre-computer world where the same anomalies can be found. The key point here is not the possibility of the anomalies but their probability. The authors believes that such anomalies cannot be avoided. But if you manage to build a system which guaranties their low probability it becomes acceptable business expenses that are generously reimbursed by the following benefits:

  1. The system provides scalability and high availability. High availability might be critical for many online businesses. Thus, apologizing in some very rare cases business does not lost many potential customers as a result of system outage.
  2. It might also reduce the cost of infrastructure as it allows for “building systems on quicksand” – on unreliable and inexpensive components such as flakey commodity computers, slow links, low quality data centers, etc.

Besides relying on low probability of the anomalies, what else can be done to mitigate their effect on users? The main approach to making user experience coherent in present of the anomalies is to expose more information to the user on what is going on in the system. For example, the process of ordering can be decomposed into multiple steps including Order Entry and Order Fulfillment. On Order Entry the system responses “Your order for the book has been accepted and will be processed” – the system manifests a tentative operation which might not be fulfilled as a result of data inconsistency (more on this can also be found in “Principles for Inconsistency” by Dean Jacobs, etc). Moreover, as any computer system is disconnected from the real world there might be external causes that prevent from order fulfillment, for example, the forklift runs over the only item you have in stock. So you cannot make any serious promises to your customers anyway relying on decisions made by computers.

The ideas presented in “Building on Quicksand” are controversial as there is no any comprehensive study of whether it is possible to achieve the required probability. Nevertheless, it is an inspiring manifest for researchers and a fascinating reading for broader audience as it might change your understating of how IT solutions should be aligned with the reality of business operations.

Written by maxgrinev

January 4, 2010 at 12:16 pm

Posted in Uncategorized