Thursday, February 12, 2009

Lazy and Eager Loading Strategies in the Entity Framework

We're working on a Timesheets project to help us learn .NET. Our architecture uses the fairly new .NET Entity Framework for its data model. We use three layers:
  1. The data access layer consists of an EDMX file, which defines CLR objects based on our database schema. (e.g. TrackedTime)
  2. The business layer consists of "business objects" with methods that do things with the objects in the data access layer. (e.g. public List getByUserAndDate(User user, DateTime dateTime))
  3. The front-end is an ASP.NET Website.
So, to get a list of all of a user's tracked times for a particular day, our front-end can call the following method:
public List getByUserAndDate(User user, DateTime dateTime, params string[] includes)
{
ObjectQuery queryObject = context.TrackedTime;
return queryByUserAndDate(queryObject, user, dateTime).ToList();
}
The method "queryByUserAndDate" generates a LINQ query that asks for all TrackedTime objects in the given ObjectQuery that have the given User and Date. The list returned by "getByUserAndDate" can then be bound to a control in the front-end, such as a DataGrid, using an ObjectDataSource.

This works fine, as long as our DataGrid only displays the data tied directly to the TrackedTime object. If we want to have a field in the DataGrid that shows all the TrackedEntities that are related to our TrackedTime object, we will need to make sure they get loaded from the database. There are two ways to do this.

One way is Lazy Loading: for each row in our datagrid, we have to insert the following code before the TrackedEntities get accessed:
if (!trackedTime.TrackedEntities.IsLoaded)
{
trackedTime.TrackedEntities.Load();
}
There are a few problems with this approach:
  1. It messes up our three-tiered architecture, since data is being loaded almost directly from the front-end layer.
  2. If the context object used in getByUserAndDate has been disposed before this code gets reached, it will throw an exception. This isn't such a problem in our architecture, for reasons I will probably discuss in another blog posting, but for many applications it simply wouldn't work.
  3. getByUserAndDate only made one database query to load in all "N" TrackedTime objects. Now, we're manually forcing "N" database queries to load in the TrackedEntity objects. With all the overhead involved in a database query, this can slow the overall process significantly.
The other approach is to use so-called "Eager" loading. In this approach, we make a method getByUserAndDateWithTrackedEntities to load the TrackedEntities in the initial LINQ statement:
public List getByUserAndDateWithTrackedEntities(User user, DateTime dateTime)
{
ObjectQuery queryObject = context.TrackedTime.Include("TrackedEntities");
return queryByUserAndDate(queryObject, user, dateTime).ToList();
}
This has the advantage of using a single database call to load all TrackedEntities for all the TrackedTime objects that are going to be returned. However, it creates a couple of new problems:
  1. Experienced programmers are loath to use Strings to define data that's tied to the object model. If, for example, you mistakenly say "TrackedEntity" instead, you'll have no idea that there's something wrong with your code until it runs. If this method only gets run in a corner case, you might not catch the bug for a long, long time. Studies show that it takes enormously more time to fix a bug when it gets caught later in the production cycle. We want the compiler to tell us if we're making a dumb mistake as soon as we make it. It's even better if we can use Intellisense to help us avoid making the error in the first place!
  2. If we have to create a new method for every possible combination of includes, this could get really cumbersome. Are we going to make a getByDateWithUserAndTrackedEntitiesAndTheirCategories method, too? Ideally, we would like to have one method that could take the includes as arguments.
To solve problem #1, a gentleman named Matthieu MEZIL (est-il Français?) created Extension Methods that basically snap into the ObjectQuery class to give you new ways of calling the Include method. After creating a class based on the one he created, we can change getByUserAndDateWithTrackedEntities thusly:
ObjectQuery queryObject = context.TrackedTime.Include(c => c.TrackedEntities);
This gives us slightly more overhead because it basically just uses reflection behind the scenes to create the "TrackedEntities" string at run-time. But it's a trifle compared to all the other overhead inherent in ASP.NET, and it is now fully strongly typed.

Is there a way to solve problem #2 in this context as well? To be honest, M. MEZIL's extension methods use deep magic that is beyond my current understanding of the .NET framework, so I haven't figured out how one might go about altering the getByUserAndDate method to allow the same kind of "lambda expression" to be passed into it, although there is certainly a way to do so. But the limitations of the front-end leave us with certain practical concerns as well, which might make it impossible (or at least really annoying) to call the method this way. You see, ASP.NET relies heavily on reflection to make it easier to get the "it just works" behavior that Microsoft is striving for. For example, there appears to be no way in my .aspx code to tell the ObjectDataSource to use actual objects as arguments to my business layer. If I had a method that looked like this:
public List getByUserAndDateWithTrackedEntities(String userId, String dateTimeStr)
... I could just use the wizard in Visual Web Developer to create simple XML-ish code that tells the framework where to get the userId and dateTime to use when calling the getByUserAndDateWithTrackedEntities method. But if I want to pass actual objects around (which not only helps us prevent the situation mentioned in #1, but also saves us from re-loading the same User object from the entity framework for every control on the page that wants information from it), I have to leave the parameter values blank in the .aspx file and put something like this in my code-behind:
protected void TodayTimes_Selecting(object sender, ObjectDataSourceSelectingEventArgs e)
{
e.InputParameters["user"] = currentUser;
e.InputParameters["dateTime"] = DateTime.Parse(DateList.SelectedValue);
}
It's ugly, but that's the price we pay for those nifty wizards and that nice design view. If I want to bind the GridView programmatically I can, but I would have to sacrifice some of what makes ASP.NET attractive in the first place. Let's assume for a moment that I won't always want to make that sacrifice. How would I add a lambda expression to the InputParameters in the above code? If it's possible, it's undoubtedly messy, and since the function is only getting invoked through reflection anyway, we've already lost the benefits of Intellisense and compiler checks, so why bother?

With that in mind, let's examine a way to solve problem #2 without solving problem #1. I begin by creating the following method extension:
public static ObjectQuery Include(this ObjectQuery mainQuery, params string[] paths)
{
ObjectQuery q = mainQuery;
foreach (string path in paths)
{
q = q.Include(path);
}
return q;
}
As you can see, all this does is call Include iteratively on every string that gets passed in to it. Now we can change our getByUserAndDate function like so:
public List getByUserAndDate(User user, DateTime dateTime, params string[] includes)
{
ObjectQuery queryObject = context.TrackedTime.Include(includes);
return queryByUserAndDate(queryObject, user, dateTime).ToList();
}
Then, depending on how much data we want to load eagerly, we can call this programmatically like so:
  • myGridView.DataSource = trackedTimeBo.getByUserAndDate(user, dateTime);
  • myGridView.DataSource = trackedTimeBo.getByUserAndDate(user, dateTime, "TrackedEntities");
  • myGridView.DataSource = trackedTimeBo.getByUserAndDate(user, dateTime, "TrackedEntities", "TrackedEntities.EntityCategory");
  • ...
Or, in the case mentioned earlier, we can add the input parameter like so:
  • e.InputParameters["includes"] = new string[] {};
  • e.InputParameters["includes"] = new string[] {"TrackedEntities"};
  • e.InputParameters["includes"] = new string[] {"TrackedEntities", "TrackedEntities.EntityCategory"};
  • ...
So we end up with a couple of good ways to do Eager Loading in the Entity Framework, each with their advantages and disadvantages. We'll probably end up using a combination of the two. Or maybe we'll find another way that makes even more sense. Who knows?

No comments:

Post a Comment