One domain per class?

Jul 16, 2009 at 7:23 PM

It looks like Simple Savant is designed to use a different domain for every model class. This is good on the one hand - it bridges the gap nicely for people who are used to using normalized, SQL-style entity relationships to describe their data. But it occurs to me that it may be better to store all entity types in a single domain, so that you don't run into the 100 domain limit and you can instead use your extra domains for partitioning.

Since SimpleDB is schemaless, there should be nothing wrong with storing all kinds of things in a single domain: Person, Order, OrderItem. These are all just "items", really, with probably a "Type" column identifying their classification (for select queries). I would like to explore adding the following functionality to Simple Savant:

  1. Define a base class to host the [DomainName("Item")] attribute.
  2. Create subclasses that inherit from our base Item class, but would share the same domain as their superclass.
  3. Add a feature to our base class that allows it to reflect on itself and automatically populate a Type property/attribute based on the subclassed type.

Thoughts?

Coordinator
Jul 16, 2009 at 8:21 PM

Sure, that's a reasonable way to arrange things and it should work without any changes to SS.

You aren't required to apply the DomainName attribute to a base class to use the same domain for different types. You could also apply DomainName to multiple classes and provide the same domain name for each of them.

For the Type name your base class could do something like this:

 

[DomainName("AllItems")]
public class ItemBase
{
    public string TypeName
    {
        get
        {
            return GetType().Name;
        }
        set {//discard value}
    }
}

 

Coordinator
Jul 16, 2009 at 8:28 PM
Edited Jul 16, 2009 at 8:29 PM

I should add that one thing planned for the next release is the ability to perform SS operations without declaring your entity class up front. This would be particularly useful with the arrangement you've described above because you could then pull back attribute sets that span multiple entity types. This would let you do a variety things such as creating pseudo relationships between entities and getting back attributes for all items related to another item in a single query, regardless of the related item types.

 

Jul 16, 2009 at 8:54 PM

I am trying this out now, and it is working quite nicely. Thanks for your help!

It would be rather nice to not have to specify [DomainName("Item")] on all of my subclasses - this only introduces the opportunity for error. Plus, if I do use domains I will surely use them for partitioning items, not for separating entity types.

It would be trivial for me to extend the code to support this convention: In the absense of a DomainName attribute on a class, look up the inheritance chain for one. If still none found, use Classname.

This brings up a good question, which I think may be better answered on another thread, but am curious to know: what the plans are for data partitioning support? In our application, it would be trivial to merge select operations, avoid batch operations, and configure the Domain on a per-item basis. We are using a common ItemName in our base class that is used for every Item:

 

[ItemName]
        public Guid Id { get; set; }

that can be used to create a hash-based partitioning strategy. Does this match up with the way other people are hoping to use these tools?

Coordinator
Jul 16, 2009 at 10:18 PM

Glad it's working!

[It would be trivial for me to extend the code to support this convention: In the absense of a DomainName attribute on a class, look up the inheritance chain for one. If still none found, use Classname.]

Putting DomainName on the base class should work. I was just pointing out that you could use either approach. Are you saying you tried that and it didn't work?

There is actually an open issue describing the likely partitioning approach:

http://simplesavant.codeplex.com/WorkItem/View.aspx?WorkItemId=2724

Let me know if that doesn't answer your questions.

Prioritization for partitioning and other features is driven by a combination of my own needs, providing a "complete" API, and what other folks are trying to do. Partitioning gets discussed quite a bit, but I haven't seen many hard numbers for how much it actually improves performance. I'd like to see some analysis of the benefits (I'll run some tests myself eventually if necessary) before making it a high-priority item).