Advanced Select Queries

Scalar and Count Queries

SimpleDB supports "select count(*)" queries that return the item counts without returning the actual item attributes. The easiest way to use count queries with Simol is to execute your query using SimolClient.SelectScalar. Here's an example:

    int count = (int)simol.SelectScalar<Employee>("select count(*) from Employee");

The generic Employee argument is not used here since the returned value is not mapped to any actual property of Employee. In fact, you could execute the same count query against the employee domain using any persistent type.

However, the correct generic parameter is required for most other scalar queries. For example, if we wanted to get the hire date of the last employee hired before today we could run the query below. In this case the mapping derived from the Employee generic parameter is used to correctly format both the returned date value and the HireDate command parameter:

    string selectText = "select HireDate from Employee where HireDate < @HireDate order by HireDate desc";
    DateTime hireDate = (DateTime)simol.SelectScalar<Employee>(selectText, new CommandParameter("HireDate", DateTime.Today));

It's possible for SimpleDB count queries to time out and return a "NextToken" before completion if your domain contains large numbers of records and the query is very complex. Simol handles this case for you and simply reinvokes the count query until it completes.

Limiting Query Results

SimpleDB limits both the number of items and the data size returned in a single select operation. By default Simol reissues your select queries until all available results have been returned. If you run the query below and your employee domain contains 10,000 items, you'll get them all back when the method returns.

    List<Employee> employees = simol.Select<Employee>("select * from Employee");

It's possible to limit your results by using the limit keyword in your query. However, if you rewrite the previous example to use the limit keyword as shown below, Simol will still return all 10,000 items in your domain. The only difference is that each batch of results retrieved from SimpleDB will be limited to 50 items rather than the default 100 item limit imposed by SimpleDB.

    List<Employee> employees = simol.Select<Employee>("select * from Employee limit 50");

To truly limit the number of results returned by Simol in a single call, you need to limit the number of result pages or batches requested from SimpleDB. The example below shows how to ensure that you'll get back no more than 50 items. (You may get fewer than 50 items for a variety of reasons--if the query times out before completing, if the employee items contain very large amounts of data, etc.)

    SelectCommand<Employee> command = new SelectCommand<Employee>("select * from Employee limit 50")
    {
        MaxResultPages = 1
    };
    SelectResults<Employee> results = simol.Select(command);

To loop through our entire list of 10,000 employees, 50 at a time, we can get the value of SelectResults.PaginationToken object and set it on our SelectCommand each time we invoke the Select method. Here's an example:

    SelectCommand<Employee> command = new SelectCommand<Employee>("select * from Employee limit 50")
    {
        MaxResultPages = 1
    };
    SelectResults<Employee> results;
    do
    {
        results = simol.Select(command);
        command.PaginationToken = results.PaginationToken;

        // do something important with the results
    } while (results.PaginationToken != null);

Finding Items with "Null" Attribute Values

By default Simol stores a single null character ('\0') in SimpleDB to indicate null property values. To test null or not null attribute values when selecting items you have two choices. The first option is to embed the null character in your query string like this: "select * from Person where Name = '\0'". The second option is to use a command parameter with a null value as shown below. The null parameter value will be automatically formatted as a single null character by Simol.

    var person = new Person
    {
        Name = null
    };
    simol.Put(person);
    List<Person> peopleWithNullNames = simol.Select<Person>("select * from Person where Name = @Name", 
        new CommandParameter("Name", (object)null));

Note: Many SimpleDB administration tools display the null character stored by Simol as "AA==". This is because the null character must be escaped when embedded into the XML returned by SimpleDB. The administration tools are simply displaying the encoded/escaped value.

Selecting Large Lists with an IN Clause

One common select operation that's challenging with SimpleDB is selecting a large list of items using a list of attribute values. As explained in Getting Started you can use an "IN" clause to retrieve lists of items. But SimpleDB limits the number of comparisons in a select expression to 20, which means your IN clause can contain at most 20 values (fewer if you have other comparisons in your query). Achieving reasonable performance while handling SimpleDB query timeouts and NextTokens makes this a distinctly non-trivial operation.

Fortunately, Simol provides a utility method for doing this in just a few lines of code. SelectUtils.SelectWithList splits your value list into batches of 20 and invokes each batch in parallel while also handling NextTokens and query timeouts. The code below demonstrates putting and then selecting 1000 employees by email.

    // build list of 1000 employees
    var employees1 = new List<Employee>();
    for (int k = 0; k < 1000; k++)
    {
        var employee = new Employee
        {
            Email = k + "@example.com",
            Id = Guid.NewGuid()
        };
        employees1.Add(employee);
    }

    // put the 1000 employees into SimpleDB and copy their emails into a list
    simol.Put<Employee>(employees1);
    var emails = employees1.Select(e => e.Email).ToList();

    // select all 1000 employees in a single call using the list of emails
    var selectUtils = new SelectUtils(simol);
    var command = new SelectCommand<Employee>("select * from Employee where Email in (@Email)");
    command.AddParameter("Email", emails);
    var employees2 = selectUtils.SelectWithList<Employee, string>(command, emails, "Email");

Efficient Results Paging

One handy feature of SimpleDB count queries is the ability to use the NextToken returned from a count query in another select query with the same predicates. If you explicitly limit your count query you can force SimpleDB to return a NextToken for any arbitrary point in the results.

For example, you could invoke the query "select count(*) from Employees limit 10" to get a NextToken starting 10 records deep into the employee results. You could then issue the query "select * from Employees limit 10" (passing the NextToken) to retrieve all attributes for employees 11-20. Or you could set the limit on your initial count query to 1000 to start paging 1000 records into the results.

To implement this sort of paging with Simol you need to be able to access the NextToken returned from your count query. But the NextToken is not exposed by the SimolClient.SelectScalar method commonly used for count queries. There are two options for accessing the NextToken:
  • Drop down to the AWS SDK library to issue your initial count query and get the NextToken
  • Use the handy SelectUtils.SelectCountNextToken method to get the NextToken using Simol. See the example below.

    var selectUtils = new SelectUtils(simol);
    var nextToken = selectUtils.SelectCountNextToken<Employee>("select count(*) from Employee limit 10");

    // IMPORTANT: If there are fewer than 10 employees the nextToken will be null and the following code should be skipped
    var command = new SelectCommand<Employee>("select * from Employee limit 10")
    {
        PaginationToken = nextToken
    };
    var employees = simol.Select<Employee>(command);

Last edited May 27, 2011 at 3:01 PM by ashleytate, version 21

Comments

Kluyg Jun 15, 2011 at 7:07 AM 
new edition of "Efficient Results Paging" section doesn't contains MaxResultPages = 1 (like original version). This will cause selecting ALL Employee starting from number 11, instead of selecting only 11 - 20.