Data Consistency

Jul 8, 2009 at 4:24 PM

Is there anything being done in the Savant code to address the eventual consistency issues in SimpleDB?  If not I can see ways to work around it if SimpleDB has "atomic" consistancy.  In other words, if attributes of an item are modified using BatchPutAttributes().  Subsequent calls to get that item with either find all attributes updated, or no attributes updated.  If this isn't guaranteed, then building any kind of consistency model into an application seems like it would be VERY difficult.  I haven't been able to determine if SimpleDB operates this way.

Coordinator
Jul 8, 2009 at 5:39 PM

Savant includes integrated caching that makes eventual consistency easier to deal with. When you invoke SimpleSavant.Get() items are retrieved from the cache when available. When used properly this prevents situations where a user adds or edits a specific data entity and then doesn't see the effects of their changes immediately.

The cache DOES NOT help you when performing select operations because the cache is only checked when you get an item by id. It would be possible, but much, much more complex to implement support for caching when doing selects--especially when you think about dealing with ordered, multi-page result sets.

BatchPutAttributes has nothing to do with atomic consistency for the attributes of a single item. BatchPutAttributes allows you to transactionally store attributes for MULTIPLE items in a single call; plain old PutAttributes will transactionally update the attributes for a single item.

Eventual consistency doesn't mean you will see different versions of the attributes for a single item; it means you may get an older VERSION of ALL the attributes for an item. So let's say I PUT an item with attributes A=1, B=1, and C=1. Then I immediately PUT the same item again with attributes A=2, B=2, and C=2. Then I immediately GET that same item. One of three things is going to happen (assuming I am hitting SimpleDB directly with NO Savant cache):

  1. I'll get null if I hit a SimpleDB node where the item hasn't yet been replicated.
  2. I'll get the attributes A=1, B=1, C=1 if I hit a node with the first version of the item--meaning my last update hasn't been replicated.
  3. I'll get the attributes A=2, B=2, C=2 if I hit a node with the latest version of the item.

But in this scenario I would NEVER get a set of attributes back with mixed attribute versions--e.g. A=1,B=2,C=1.

Jul 8, 2009 at 7:23 PM

That's the answer I was hoping for.  As a follow-up, is it safe to access Savant from multiple threads?  My use case is something like the following running on a web server:

- In response to user click, object is created and stored using Savant, then initiates a call to Web Service X and exits.

- Web Service X POSTS back to my Web Server with the object ID.  The handler on my web server updates the object using Savant, and does processing based on object properties.

In the case of multiple users accessing the site at the same time, this would result in essentially concurrent calls to Savant.  Is this safe?

Coordinator
Jul 8, 2009 at 9:57 PM
Edited Jul 8, 2009 at 9:58 PM

Yes, the SimpleSavant class is thread-safe with two caveats:

  1. The installed IItemCache implementation must be thread-safe. The cache is really the only shared resource where thread-contention is an issue in the current release. The default cache is thread-safe (this is specifically unit tested).
  2. This is still alpha software so there are no guarantees (like there ever are anyway!).

What do I mean by alpha software? Well, for a critical data-level API like Savant I will keep calling it alpha until I've confirmed that it's in production use by at least one significant application (which could be my own if no one else reports in first). It is heavily tested, has been downloaded about 350 times with very few issues reported, and so far has been pretty stable for my personal use. But it also has not been beaten up with a serious load test, soak test, or tuned for performance.

Sep 9, 2009 at 9:47 AM

It seems to me that having a cache in SS would actually make data consistency worse in a web farm scenario.

Each web server would have its own local SS cache. Changes made to an item on one server would not been seen on another server until the cached item had expired.

I suppose you would simply turn off the cache in this scenario.

Any thoughts?

Coordinator
Sep 9, 2009 at 12:46 PM

It would depend on whether client sessions were pinned to individual servers in the farm or not. But generally I would expect folks to use a distributed cache such as memcached or Microsoft Velocity in a web farm scenario.

The cache implementation included with SS is just there for demonstration purposes and simple scenarious involving one server.

Sep 10, 2009 at 8:43 AM
Thanks. That makes sense.

2009/9/9 ashleytate <notifications@codeplex.com>

From: ashleytate

It would depend on whether client sessions were pinned to individual servers in the farm or not. But generally I would expect folks to use a distributed cache such as memcached or Microsoft Velocity in a web farm scenario.

The cache implementation included with SS is just there for demonstration purposes and simple scenarious involving one server.

Read the full discussion online.

To add a post to this discussion, reply to this email (SimpleSavant@discussions.codeplex.com)

To start a new discussion for this project, email SimpleSavant@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online at codeplex.com


Feb 15, 2010 at 11:18 AM

If your app is load-balanced in a way that supports user affinity, or if it only has one server, then the existing simple cache implementation actually works well. But the simple implementation, or an expanded version on memcached, both fail to handle selects unless you handle them explicitly in the cache - that is, handle Selects by parsing them, checking the return value, and modifying it to reflect what's in the cache before passing it along.

I'm working on an app that could use that expansion of the simple cache, so I may have a go at doing just that in CachingSavant. I can send back a patch of what I come up with if you're interested; it would most likely only target basic use cases (select, select where x is not null, select where x = y, select order by x).

Coordinator
Feb 15, 2010 at 1:13 PM
Edited Feb 17, 2010 at 6:55 PM

I'd love to see what you come up with. The MS Velocity distributed cache has its own language for querying the cache. I'm not sure how easily that would map to SimpleDB's syntax but it might be worth a look.

I'm curious to see how you handle results mixed between the cache and SimpleDB and ordering of results. In my own apps I normally address this type of caching at the application level on a case by case basis.