Tuesday, February 17, 2009

Joining entities to avoid database roundtrips

In my last post, I hinted that I had an alternative in mind for the Eager Loading problem that has vexed me over the past few days.

First of all, it's important to understand what we intend to gain by eager loading--fewer database round-trips. You see, every time your program asks the database for information, there is considerable overhead with establishing the connection, waiting for your request to go over the wire, and waiting for the response to come back. If your queries return large data sets, this overhead will have a minimal impact. But if you're doing a whole bunch of tiny queries, this overhead can make a huge difference. In my case, where I often work from home and connect to the database at work, the overhead from these so-called "round-trips" can actually take more time than everything else in a typical page load.

So rather than saying:

1
2
3
4
5
6
7
8
9
10
11
TrackedEntityBO tebo = new TrackedEntityBO(context);

foreach (RepeaterItem item in Repeater1.Items)
{
CheckBoxList list = (CheckBoxList)item.FindControl("CheckBoxList1");
foreach (ListItem li in list.Items)
{
TrackedEntity te = tebo.getById(Decimal.Parse(li.Value)); // Run 1 query per TrackedEntity
...
}
}
... , I could save a lot of time by saying:

1
2
3
4
5
6
7
8
9
10
11
12
13
TrackedEntityBO tebo = new TrackedEntityBO(context);
// Runs a single linq query for all TrackedEntities
Dictionary<decimal, TrackedEntity> ueMap = tebo.getAll().ToDictionary(c => c.TrackedEntityId);

foreach (RepeaterItem item in Repeater1.Items)
{
CheckBoxList list = (CheckBoxList)item.FindControl("CheckBoxList1");
foreach (ListItem li in list.Items)
{
TrackedEntity te = ueMap[Decimal.Parse(li.Value)];
...
}
}
The folks who designed the Entity Framework were aware of this, and they took action to prevent round-trips as much as possible. For example:
  1. Each context instance can cache a lot of the data that it gathers. If a later LINQ query on the same context can be determined to contain only data that has previously been cached, the Framework will use the cached data rather than executing another query.
  2. Unlike Linq to SQL, the Entity Framework will not make a database query without being asked. For example, the following code will produce an exception because the EntityCategory object never got loaded into the context:


    1
    2
    3
    TrackedEntity e = (from te in context.TrackedEntity
    select te).First();
    Console.WriteLine (e.EntityCategory.Name);
As I pointed out in my last post, you simply can't make assumptions regarding which items have been loaded into the context. That means if you want to get certain objects associated with other objects, you'll either have to do a whole bunch of queries, or put both groups of objects into a data structure that maps them to one another. Thanks to the power of the .NET framework, however, this isn't all that difficult. For example, with relatively few lines of actual code, I was able to make the following statements possible:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
IQueryable<TrackedEntity> teQuery = from te in context.TrackedEntity
orderby te.Name
select te;
IQueryable<EntityCategory> ecQuery = from te in teQuery
let ec = te.EntityCategory
orderby ec.Name
select ec;
JoinedEntityMap<EntityCategory, TrackedEntity> j =
ecQuery.ToMap(c => c.EntityCategoryId, teQuery, te => te.EntityCategory);

foreach (EntityCategory c in j.UniqueKeys)
{
foreach (TrackedEntity e in j[c])
{
Console.WriteLine(c.Name + ": " + e.Name);
}
}
With negligible memory overhead, the above code causes a grand total of two SQL queries and takes 110 milliseconds to run with 4 EntityCategories and 47 TrackedEntities, as opposed to the code below, which takes 251 milliseconds:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
IQueryable<EntityCategory> ecQuery = from ec in context.EntityCategory
orderby ec.Name
select ec;

foreach (EntityCategory c in ecQuery)
{
IQueryable<TrackedEntity> teQuery = from te in context.TrackedEntity
where te.EntityCategory.EntityCategoryId == c.EntityCategoryId
orderby te.Name
select te;
foreach (TrackedEntity e in teQuery)
{
Console.WriteLine(c.Name + ": " + e.Name);
}
}
With that kind of difference on such a small scale, you can imagine how much of a difference it could make if you had hundreds, or even thousands of records in each table!

Another potential benefit to this approach is that the main page's code can create the data it knows will be needed, and then hand strongly-typed data to each user control in turn. It not only makes it possible for the user controls to be more specific in what sort of data they need as parameters, but it also increases the transparency of your data accesses. With most of your calls to the data layer occurring in one code file, it becomes obvious what's taking so much time to load. And if you're not using an HttpContext-based Entity context (which we are), it can also simplify the problems you'll run into with conflicting contexts.

No comments:

Post a Comment