Thursday, April 16, 2009

Enumerate across a date range in Linq using "yield"

I'm making a generic data table displayer which should be able to display any set of data that is formatted properly.  It can either deduce the table headers based on the data it receives, or it can use ones that I specify via a list of key/value pairs.  For example, by setting the value of this property:
1
IEnumerable<DataPair<object, String>> RowKeysWithHeaders
... I can make the row headers display all of the Strings on the right side of the given DataPairs in the IEnumerable's order, and the data for each row of the table will be aligned to match the keys on the left side of the DataPairs.

Now, let's say I'm using the data returned by the query I mentioned yesterday, which will only include dates for which there are TrackedTime entries, but I want to display all dates within a given date range, and simply leave cells empty if there are no entries for those dates.  Furthermore, I want to specify how the dates are formatted.  I could create a list of date/String pairs to use as row keys with headers, like this:
1
2
3
4
5
6
7
8
9
10
11
List<DataPair<object, String>> dateHeaders = new List<DataPair<object, String>>();
for(DateTime date = startDate; date < endDate; date.AddDays(1))
{
DataPair<object, String> header = new DataPair<object, String>
{
LeftObject = date,
RightObject = date.ToShortDateString()
};
dateHeaders.Add(header);
}
this.ReportTableDisplay1.RowKeysWithHeaders = dateHeaders;
But that would be kind of wasteful, wouldn't it?  It would mean creating an entire List of dates, when all I need is to print a bunch of consecutive dates--something I should be able to do mathematically on the fly.

A more efficient way would be to create an IEnumerable class (and an accompanying IEnumerator class) that know how to iterate across dates.  But that's a lot more work and a lot more code for something that should be relatively simple.

Thanks to the yield operator, there is a better way.  This simple method:
1
2
3
4
5
6
7
8
9
public static IEnumerable<DateTime> DaysInRange(DateTime startDate, DateTime endDate)
{
DateTime current = startDate;
while (current <= endDate)
{
yield return current;
current = current.AddDays(1);
}
}
... will create an enumerable object that does exactly what I need it to, without the need for any extra classes or anything.  Here's how I use it:
1
2
3
4
5
6
this.ReportTableDisplay1.RowKeysWithHeaders = from d in DaysInRange(startDate, endDate)
select new DataPair<object, String>
{
LeftObject = d,
RightObject = d.ToShortDateString()
};
This simple LINQ statement then gives me an IEnumerable that iteratively creates new DatePairs as the program traverses it.  You have to admit, that's pretty smooth.  That means that if I decide to paginate the table results, I can use the Take() method, and the system won't even produce DataPairs for dates that I don't iterate over.  I can also move my DaysInRange method into a common utility class where it can be accessed any time I need to traverse a range of dates.  It's a great example of how we can use LINQ in conjunction with the yield operator to create simple, efficient code.

It can also be used to highlight one of the dangers of accepting IEnumerable arguments.  Since I literally have no idea how expensive it might be to iterate over a given IEnumerable, I need to make sure my ReportTableDisplay class only iterates over the RowKeysWithHeaders once.  Otherwise I could end up creating who-knows-how-many copies of exactly the same DataPair without even realizing it.  Just think how that would turn out if I was using a truly expensive IEnumerable--one whose IEnumerator begins by accessing data from over the Internet, for example!

Wednesday, April 15, 2009

Improved Dynamic Query

Yesterday I figured out how we could use C# Expressions to filter query results by any number of criteria. Today I refined my approach somewhat. Instead of the long, complex LINQ query I used there, I now have:


1
2
3
4
5
6
7
8
9
var userTimes = (from t in times
group t by new {t.User.UserName, t.TargetDate} into ut
select new
{
UserName = ut.Key.UserName,
TargetDate = ut.Key.TargetDate,
Minutes = ut.Sum(t => t.Minutes)
}
);
This produces a faster, more reasonable set of SQL code which returns almost identical data:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Region Parameters
DECLARE @p__linq__2 DateTime SET @p__linq__2 = '2009-02-14 16:01:50.069'
-- EndRegion
SELECT
1 AS [C1],
[GroupBy1].[K2] AS [UserName],
[GroupBy1].[K1] AS [TargetDate],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[TargetDate] AS [K1],
[Extent2].[UserName] AS [K2],
SUM( CAST( [Extent1].[Minutes] AS int)) AS [A1]
FROM [dbo].[TrackedTime] AS [Extent1]
LEFT OUTER JOIN [dbo].[aspnet_Users] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[UserId]
WHERE ([Extent1].[TargetDate] > @p__linq__2) AND ([Extent1].[TargetDate] < (GetDate()))
GROUP BY [Extent1].[TargetDate], [Extent2].[UserName]
) AS [GroupBy1]
Note that I avoided the problem I mentioned here by returning a simple set of native types rather than Entity Framework objects. This approach wouldn't be best under normal circumstances. If I'm querying the database for usernames, for example, I will often want to have more user data as well (like their real names) available to my front-end code. In that case, I could benefit from using the Entity Framework to get all the object data in a single roundtrip. However, if I'm pulling data for reporting purposes, I generally know exactly what data I want to be displaying. In that case it's more important to keep the data set small and fast. I've seen poorly-designed reporting engines pull so much data from the database that the VM runs out of memory, which causes all sorts of problems.

Finally, rather than relying on the LINQ framework to package my objects into custom classes for me, I created a generic TableData class, along with an extender that allows me to do this:

1
2
3
4
5
public TableData<String, DateTime, int> generateData(List<Filter<TrackedTime>> dateFilters)
{
...
return userTimes.ToTable(d => d.UserName, d => d.TargetDate, d => d.Minutes);
}
The next step would be to create a generic web control that creates a front-end table when given the returned report data. That way we can use a single control to display all kinds of report data, rather than creating a new method to display each new set of report data.

Tuesday, April 14, 2009

Dynamic Queries in Entity Framework

I've been working on a proof-of-concept for building dynamic Linq queries in the Entity Framework. The idea is to have any number of criteria that a user can set to limit or expand the scope of (for example) a report.

I still have a long way to go before it would be ready to use for real, but here's what I've been able to do:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
    public class ReportPerUserPerDay2
{
public List<UserNamesWithDatesWithMinuteTotals> generateData(List<Filter<TrackedTime>> dateFilters)
{
TimesheetEntities context = new TimesheetEntities();
IQueryable<TrackedTime> times = context.TrackedTime;
foreach (Filter<TrackedTime> dateFilter in dateFilters)
{
times = times.Where(dateFilter.getPredicate());
}
var userTimes = (from t in times
group t by t.User.UserName into ut
from ut2 in
(from t in ut
group t by t.TargetDate into ut3
select new DatesWithMinuteTotals()
{
TargetDate = ut3.Key,
Minutes = ut3.Sum(t => t.Minutes)
})
group ut2 by ut.Key into ut4
select new UserNamesWithDatesWithMinuteTotals()
{
UserName = ut4.Key,
Dates = from t in ut4 select t
}
);

return userTimes.ToList();
}
}

public class DatesWithMinuteTotals
{
public DateTime TargetDate;
public long Minutes;
}

public class UserNamesWithDatesWithMinuteTotals
{
public String UserName;
public IEnumerable<DatesWithMinuteTotals> Dates;
}
So I can pass the generateData method filters like this one:



1
2
3
4
5
6
7
8
9
10
11
12
13
public partial class Controls_ReportFilters_FilterDateRange : System.Web.UI.UserControl, TimesheetReporting.Filters.Filter<TrackedTime>
{
...

public Expression<Func<TrackedTime, bool>> getPredicate()
{
if (StartDate.Ticks == 0 && EndDate.Ticks == 0) return t => true;
if (StartDate.Ticks == 0) return t => t.TargetDate <= EndDate;
if (EndDate.Ticks == 0) return t => t.TargetDate >= StartDate;
return t => t.TargetDate <= EndDate && t.TargetDate >= StartDate;
}

}
and the Linq statement will auto-generate this crazy SQL code:



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Region Parameters
DECLARE p__linq__2 DateTime SET p__linq__2 = '2009-02-13 18:13:53.470'
-- EndRegion
SELECT
[Project7].[UserName] AS [UserName],
[Project7].[C1] AS [C1],
[Project7].[C3] AS [C2],
[Project7].[TargetDate] AS [TargetDate],
[Project7].[C2] AS [C3]
FROM ( SELECT
[Distinct2].[UserName] AS [UserName],
1 AS [C1],
[Project6].[TargetDate] AS [TargetDate],
CASE WHEN ([Project6].[C2] IS NULL) THEN CAST(NULL AS bigint) ELSE CAST( [Project6].[C1] AS bigint) END AS [C2],
[Project6].[C2] AS [C3]
FROM (SELECT DISTINCT
[Project2].[UserName] AS [UserName]
FROM (SELECT
@p__linq__2 AS [p__linq__2],
[Distinct1].[UserName] AS [UserName]
FROM ( SELECT DISTINCT
[Extent2].[UserName] AS [UserName]
FROM [dbo].[TrackedTime] AS [Extent1]
LEFT OUTER JOIN [dbo].[aspnet_Users] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[UserId]
WHERE [Extent1].[TargetDate] > @p__linq__2
) AS [Distinct1] ) AS [Project2]
CROSS APPLY (SELECT
[Extent3].[TargetDate] AS [K1]
FROM [dbo].[TrackedTime] AS [Extent3]
LEFT OUTER JOIN [dbo].[aspnet_Users] AS [Extent4] ON [Extent3].[UserId] = [Extent4].[UserId]
WHERE ([Extent3].[TargetDate] > @p__linq__2) AND (([Extent4].[UserName] = [Project2].[UserName]) OR (([Extent4].[UserName] IS NULL) AND ([Project2].[UserName] IS NULL)))
GROUP BY [Extent3].[TargetDate] ) AS [GroupBy1] ) AS [Distinct2]
LEFT OUTER JOIN (SELECT
[Project5].[UserName] AS [UserName],
[GroupBy2].[K1] AS [TargetDate],
[GroupBy2].[A1] AS [C1],
1 AS [C2]
FROM (SELECT
@p__linq__2 AS [p__linq__2],
[Distinct3].[UserName] AS [UserName]
FROM ( SELECT DISTINCT
[Extent6].[UserName] AS [UserName]
FROM [dbo].[TrackedTime] AS [Extent5]
LEFT OUTER JOIN [dbo].[aspnet_Users] AS [Extent6] ON [Extent5].[UserId] = [Extent6].[UserId]
WHERE [Extent5].[TargetDate] > @p__linq__2
) AS [Distinct3] ) AS [Project5]
CROSS APPLY (SELECT
[Extent7].[TargetDate] AS [K1],
SUM( CAST( [Extent7].[Minutes] AS int)) AS [A1]
FROM [dbo].[TrackedTime] AS [Extent7]
LEFT OUTER JOIN [dbo].[aspnet_Users] AS [Extent8] ON [Extent7].[UserId] = [Extent8].[UserId]
WHERE ([Extent7].[TargetDate] > @p__linq__2) AND (([Extent8].[UserName] = [Project5].[UserName]) OR (([Extent8].[UserName] IS NULL) AND ([Project5].[UserName] IS NULL)))
GROUP BY [Extent7].[TargetDate] ) AS [GroupBy2] ) AS [Project6] ON ([Project6].[UserName] = [Distinct2].[UserName]) OR (([Project6].[UserName] IS NULL) AND ([Distinct2].[UserName] IS NULL))
) AS [Project7]
ORDER BY [Project7].[UserName] ASC, [Project7].[C3] ASC
which, amazingly enough, rather efficiently produces exactly the information I wanted, and no more. Pretty slick, eh?