General Discussion

Mar 17, 2009 at 2:10 AM
Open thread for general questions about Simple Savant.
Apr 13, 2009 at 6:33 AM
Edited Apr 13, 2009 at 6:33 AM
Let's assume I'm using the person object like you have in the sample app. I would like to store half the users in one domain and the other half in another, then run parallel get requests to make the search run "faster". How would I go about explicitly specifying the domain for each put as it is being pulled from the attribute on the class?
Apr 13, 2009 at 9:58 PM

I had a new idea about the above post. What if you implemented another attribute for partitioning? Either using the [property value].ToString() and append it to the domain name of the class or create an overloaded method / event where you can override the creation and formatting of the domain name. Not sure about this but it is something I am going to play around with. I need a method of partitioning my data across domains. If I could decorate a property in a class with a SavantPartitionAttribute it would obviously be the easiest method for putting items into SimpleDB. The issue here is getting the items back out, in my case I'll know what Partitions to search so it could be as simple as passing an array of id's (I'm partitioning the database based on about 12 Guid's) and starting a new thread for each partition.

Any thoughts?

Apr 13, 2009 at 10:07 PM
Simple Savant does not currently support changing the domain name at runtime.

However, I'm about to put out another release that supports (among other things) support for partial attribute-set operations. As part of this work I've refactored much of the internal implementation to support typeless operations (i.e. the get/put/select operations are no longer tied to a concrete Type). My plan is to expose typeless operations through the API in the near future and this would be one way you could accomplish data set partitioning and still benefit from many Simple Savant features.

I can also envision supporting domain selection specifically via a configurable strategy object that would implement dataset partitioning behavior (and perhaps providing a default hash-based implementation as described here).

Do you have any concrete plans for how you would implement partitioning? How big a dataset will you be working with?

Apr 13, 2009 at 10:16 PM
Edited Apr 13, 2009 at 10:17 PM
Regarding the SavantPartitionAttribute: This would definitely not be a trivial feature to implement. Get and Put operations would be straightforward. But aggregating the results of select operations in a meaningful way and figuring out what to do with batch puts (using the new BatchPutAttributes operation) when the batch of items were partitioned across multiple domains would be difficult problems to solve.
Apr 13, 2009 at 10:55 PM

I've got about 70,000 items per day that are fairly evenly split across 10 unique id's. I'm storing about 3 weeks worth of data so that is about 1.5 million records (split into about 150,000 items per domain if split across the 10 unique ids.) The data is fairly read only and updated about 6-10 times a day. Maybe a simple flat file record per day would be the easiest way to implement this, not sure at this point. A typical query will fetch around 10,000 of the 70,000 items and would need to return the full items (no partial gets.)

Another strategy on my side could be to simply decorate the property with SavantExclude and split it up into multiple lists. Hashing the date for the item could be used as a partition as well. I have many options in that regard but I can't decide the best method. The data is pulled for a single day so the best method would be partitioning on another property.

Apr 14, 2009 at 10:39 PM
Edited Apr 14, 2009 at 10:40 PM
Thanks for the details. I've added an issue sketching out an approach for this, but I doubt I will have time to implement it any time soon.

Do you have any specific numbers on the relative performance of dataset partitioning and at what point it begins to pay off?