Friday, September 28, 2012

NonNull extension method

One of the things that drives me nuts lately is null checking, my code is plagued with:
if (foo != null)
{
    // Business as usual
}

Say you have the following classes (exaggerated, the property "First" in class "Name" would usually just be a string):
public class Person
{
    public Name Name { get; set; }
}

public class Name
{
    public First First { get; set; }
}

public class First
{
    public string Value { get; set; }
}

And the following collection:
var people = new[]
{
    new Person { Name = new Name { First = new First { Value = "John" } } },
    new Person { Name = new Name { First = new First { Value = "Eric" } } },
    new Person { Name = new Name { First = new First { Value = "Joel" } } },
    new Person { Name = new Name { First = new First { Value = null } } },
    new Person { Name = new Name { First = new First() } },
    new Person { Name = new Name() },
    new Person { Name = null },
    new Person(),
    null
};

On a given query, you might see something like:
private void PrintFirstNames(IEnumerable<Person> people)
{
    if (people != null)
    {
        foreach (var person in people)
        {
            if (person != null && person.Name != null
                    && person.Name.First != null && person.Name.First.Value != null)
            {
                Console.WriteLine(person.Name.First.Value);
            }
        }
    }
}

Which can, of course, be written more concisely in LINQ:
private void PrintFirstNamesLinq(IEnumerable<Person> people)
{
    var names = people.Where(p => p != null).Select(p => p.Name)
                .Where(n => n != null).Select(n => n.First)
                .Where(f => f != null).Select(f => f.Value);
    foreach (var name in names)
    {
        Console.WriteLine(name);
    }
}

However, after writing a handy little extension method called NonNull:
public static IEnumerable<R> NonNull<T, R>(this IEnumerable<T> source, Func<T, R> action)
{
    if (source == null)
    {
        yield break;
    }

    foreach (var val in source.Where(s => s != null).Select(action).Where(val => val != null))
    {
        yield return val;
    }
}

It can be written even more concisely:
private void PrintFirstNamesNonNull(IEnumerable<Person> people)
{
    var names = people.NonNull(p => p).NonNull(p => p.Name).NonNull(p => p.First).NonNull(p => p.Value);
    foreach (var name in names)
    {
        Console.WriteLine(name);
    }
}

And even more concisely with a ForEach extension method (see here for code and the debate surrounding it):
private void PrintFirstNamesNonNullForEach(IEnumerable<Person> people)
{
    people.NonNull(p => p).NonNull(p => p.Name).NonNull(p => p.First).NonNull(p => p.Value).ForEach(Console.WriteLine);
}
Another win for C# extension methods (and Monads, but I suppose that's another discussion).

Friday, February 18, 2011

Load Testing with Google's WebDriver

I have been working with WebDriver/Selenium 2 for some time now; I implemented it for my team at work and have since discovered ways to do load testing.

I code primarily in C#, but if this were to be done in Java you could take advantage of the headless browser driver (HtmlUnitDriver), which would allow you to potentially run 100s of tests at once. IKVM allows you to convert this to C#, but when I did this I discovered that a 5 second test took 2 minutes or more to run, and I wasn't willing to deal with that.

A viable option for a headless browser I found for C# is an open source project called SimpleBrowser, it doesn't support JavaScript but it's the foundation for a bigger project called XBrowser that will support not only JavaScript but HTML5, SVG, and Canvas.
It's not implemented in WebDriver but I was easily able to run 100 concurrent instances of SimpleBrowser performing a simple test.

Anyway, on to the code:



Some notes:
  1. You will need to use a delegate to pass parameters to the test, in this case I've used a lambda expression.
  2. I've run into problems trying to run this as a unit test:
  • For some reason, a unit test doesn't run the threads all at once, instead it will run them one at a time.
  • You will have to spin the main thread while it waits for the other threads to finish.

Recommended reading:

Friday, July 10, 2009

Combining Lambda Expressions

I'm working on custom fields for a GridView which can include custom filters. The idea is to persist these filters to the data layer, so that when a filter is applied, the query sent to the database includes all the appropriate "where" clauses.

I wanted to have an abstract parent class that deals with most of the tedium involved in extending the DataControlField class, and then only require the child classes to handle the cases specific to their custom filters. The parent class would know how to create the lambda expression required to get a given property off of the object that is being represented in the GridView, while the child class would know how to filter by that property. I was willing to make the parent class a little more messy, as long as it could easily be extended to make whatever kind of filterable columns we wanted. For example, if the child class was filtering based on string data types, I wanted to make the following syntax possible:
protected override IEnumerable<Expression<Func<string, bool>>> GetFilters()
{
if (filterTextBox.Text != null && filterTextBox.Text.Length > 0)
{
yield return s => s.StartsWith(filterTextBox.Text);
}
}
But I ran into a wall trying to get the parent class to build the full filter expression based on this partial filter expression. I needed a method that looked like this:
public Expression<Func<TrackedTime, bool>> mergeFilterExpressions(
Expression<Func<TrackedTime, T>> propGetter,
Expression<Func<T, bool>> filter)

So given one expression that could get the property from the TrackedTime, and another expression that could get a boolean value from the property, I needed to be able to create an Expression that gets a boolean value from the TrackedTime. In LINQ itself, there is of course the ability to Invoke the functions represented by my expressions, like this:
return tt => filter.Invoke(propGetter.Invoke(tt));
However, LINQ to Entities doesn't support the Invoke command. So although this would compile just fine, it would throw a runtime exception when the time came to build the SQL query to send to the database. It was driving me crazy.

After much searching, it was Joe Albahari, creator of the amazing LinqPad (which I use almost daily by the way), who had the solution. On his website, he provides a collection of extension methods called LinqKit, which allowed me to do the following:
public Expression<Func<TrackedTime, bool>> mergeFilterExpressions(
Expression<Func<TrackedTime, T>> propGetter,
Expression<Func<T, bool>> filter)
{
ParameterExpression param = propGetter.Parameters[0];
InvocationExpression getProp = Expression.Invoke(propGetter, param);
InvocationExpression invoke = Expression.Invoke(filter, getProp.Expand());
return Expression.Lambda<Func<TrackedTime, bool>>(invoke.Expand(), param);
}
How 'bout that! By Invoking and then Expanding the Expressions, I can now combine Lambda Expressions any way I want!

Thursday, June 4, 2009

Would a Wave-based IDE be feasible

Google recently announced "Google Wave," a project that they've been working on for the past two years, and which is set to revolutionize online communication as we know it.  When it was announced, they mentioned that the whole thing was built using Google's Web Toolkit.  I started looking into the toolkit and noticed that they also have something called the Google App Engine, which is a framework and service that lets you host your applications and data "in the cloud" on Google's servers.

Pondering on these various Internet technologies, I got to wondering:  Would it be possible to host the entire development process online?  If someone could create a Wave-based code infrastructure, we could have an online IDE:
  • Each Java code file could be a Wave, which could be edited collaboratively--by multiple users at the same time, if necessary (hello Extreme Programming!).  
  • A spell-check-like plugin could be created to provide real-time compiler feedback and intellisense.  
  • A bot could be granted access to the code tree in order to compile and deploy changes to a cloud-based service in real-time, provide debugging services, and even run unit tests.
  • Developers could "check out" the waves into their own framework instance, and once a set of changes is ready, they could be merged into the "stable" set of Waves.
  • Waves have a built-in, extremely powerful version control system built in already; you can visually and immediately step back to each point in a file's revision history to watch its evolution.
  • The Google folks already showed how useful Waves can be in bug management; bugs and tasks could be handled and passed around within the same Wave framework.  They could probably even be linked to the code changes that were made to fix them (and vice versa), for future reference.
  • A plugin similar to the bug management one could be used to tag a spot in code for colleague review.  A Wave thus tagged would appear in the colleague's inbox, where they could see the changes made in the context of the entire Java file.  They could start a thread inline in the code to ask questions and make suggestions, which would all be immediately visible to the original programmer.  They could even have an entire chat session right there, inline with the code!  Both the original programmer and the colleague could make and see the changes in real-time.  Once the colleague is satisfied, they could use the plugin to sign off on the changes.
  • Documentation (both internal and external) could also be managed by the same system. Code for a particular feature could be linked to that feature's documentation.
And perhaps the best part about the whole thing is that developers don't even need to install anything on their computers.  They can log in from any web-enabled computer, anywhere, and all the same capabilities are at their fingertips.  With Wave, Google has laid the foundation for a new generation of Internet technologies.  I'm excited to see the many ways that this sort of technology will be leveraged in the years to come.

Monday, May 4, 2009

Fastest way to check for existence in a table

In order to improve performance on one of the pages in our Java code, I was making a SQL query which, along with the typical section grade information, also pulls in a field to tell whether that particular grade type is in use.  Between my own brains and some quick Google searching, this was the best query I could come up with:
1
2
3
select sg.*,
(select top 1 1 from section_roster sr where sr.section_grade_id = sg.section_grade_id) as isUsed
from section_grade sg
I assumed that in the absence of any ordering, "select top 1 1" would be converted by SQL Server into a sort of "exists" statement, and would therefore be the fastest query for the job.  But just out of curiosity, I ran a similar LINQ query in LINQPad to see what SQL would be generated.  Based on those results, I created the following query:
1
2
3
4
5
select sg.*,
(case when exists(select null from section_roster sr where sr.section_grade_id = sg.section_grade_id) then 1
else null
end) as isUsed
from section_grade sg
Although it's not as simple a query, I was able to drop the execution time from about 27 milliseconds to about 9 milliseconds.

Thursday, April 16, 2009

Enumerate across a date range in Linq using "yield"

I'm making a generic data table displayer which should be able to display any set of data that is formatted properly.  It can either deduce the table headers based on the data it receives, or it can use ones that I specify via a list of key/value pairs.  For example, by setting the value of this property:
1
IEnumerable<DataPair<object, String>> RowKeysWithHeaders
... I can make the row headers display all of the Strings on the right side of the given DataPairs in the IEnumerable's order, and the data for each row of the table will be aligned to match the keys on the left side of the DataPairs.

Now, let's say I'm using the data returned by the query I mentioned yesterday, which will only include dates for which there are TrackedTime entries, but I want to display all dates within a given date range, and simply leave cells empty if there are no entries for those dates.  Furthermore, I want to specify how the dates are formatted.  I could create a list of date/String pairs to use as row keys with headers, like this:
1
2
3
4
5
6
7
8
9
10
11
List<DataPair<object, String>> dateHeaders = new List<DataPair<object, String>>();
for(DateTime date = startDate; date < endDate; date.AddDays(1))
{
DataPair<object, String> header = new DataPair<object, String>
{
LeftObject = date,
RightObject = date.ToShortDateString()
};
dateHeaders.Add(header);
}
this.ReportTableDisplay1.RowKeysWithHeaders = dateHeaders;
But that would be kind of wasteful, wouldn't it?  It would mean creating an entire List of dates, when all I need is to print a bunch of consecutive dates--something I should be able to do mathematically on the fly.

A more efficient way would be to create an IEnumerable class (and an accompanying IEnumerator class) that know how to iterate across dates.  But that's a lot more work and a lot more code for something that should be relatively simple.

Thanks to the yield operator, there is a better way.  This simple method:
1
2
3
4
5
6
7
8
9
public static IEnumerable<DateTime> DaysInRange(DateTime startDate, DateTime endDate)
{
DateTime current = startDate;
while (current <= endDate)
{
yield return current;
current = current.AddDays(1);
}
}
... will create an enumerable object that does exactly what I need it to, without the need for any extra classes or anything.  Here's how I use it:
1
2
3
4
5
6
this.ReportTableDisplay1.RowKeysWithHeaders = from d in DaysInRange(startDate, endDate)
select new DataPair<object, String>
{
LeftObject = d,
RightObject = d.ToShortDateString()
};
This simple LINQ statement then gives me an IEnumerable that iteratively creates new DatePairs as the program traverses it.  You have to admit, that's pretty smooth.  That means that if I decide to paginate the table results, I can use the Take() method, and the system won't even produce DataPairs for dates that I don't iterate over.  I can also move my DaysInRange method into a common utility class where it can be accessed any time I need to traverse a range of dates.  It's a great example of how we can use LINQ in conjunction with the yield operator to create simple, efficient code.

It can also be used to highlight one of the dangers of accepting IEnumerable arguments.  Since I literally have no idea how expensive it might be to iterate over a given IEnumerable, I need to make sure my ReportTableDisplay class only iterates over the RowKeysWithHeaders once.  Otherwise I could end up creating who-knows-how-many copies of exactly the same DataPair without even realizing it.  Just think how that would turn out if I was using a truly expensive IEnumerable--one whose IEnumerator begins by accessing data from over the Internet, for example!

Wednesday, April 15, 2009

Improved Dynamic Query

Yesterday I figured out how we could use C# Expressions to filter query results by any number of criteria. Today I refined my approach somewhat. Instead of the long, complex LINQ query I used there, I now have:


1
2
3
4
5
6
7
8
9
var userTimes = (from t in times
group t by new {t.User.UserName, t.TargetDate} into ut
select new
{
UserName = ut.Key.UserName,
TargetDate = ut.Key.TargetDate,
Minutes = ut.Sum(t => t.Minutes)
}
);
This produces a faster, more reasonable set of SQL code which returns almost identical data:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Region Parameters
DECLARE @p__linq__2 DateTime SET @p__linq__2 = '2009-02-14 16:01:50.069'
-- EndRegion
SELECT
1 AS [C1],
[GroupBy1].[K2] AS [UserName],
[GroupBy1].[K1] AS [TargetDate],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Extent1].[TargetDate] AS [K1],
[Extent2].[UserName] AS [K2],
SUM( CAST( [Extent1].[Minutes] AS int)) AS [A1]
FROM [dbo].[TrackedTime] AS [Extent1]
LEFT OUTER JOIN [dbo].[aspnet_Users] AS [Extent2] ON [Extent1].[UserId] = [Extent2].[UserId]
WHERE ([Extent1].[TargetDate] > @p__linq__2) AND ([Extent1].[TargetDate] < (GetDate()))
GROUP BY [Extent1].[TargetDate], [Extent2].[UserName]
) AS [GroupBy1]
Note that I avoided the problem I mentioned here by returning a simple set of native types rather than Entity Framework objects. This approach wouldn't be best under normal circumstances. If I'm querying the database for usernames, for example, I will often want to have more user data as well (like their real names) available to my front-end code. In that case, I could benefit from using the Entity Framework to get all the object data in a single roundtrip. However, if I'm pulling data for reporting purposes, I generally know exactly what data I want to be displaying. In that case it's more important to keep the data set small and fast. I've seen poorly-designed reporting engines pull so much data from the database that the VM runs out of memory, which causes all sorts of problems.

Finally, rather than relying on the LINQ framework to package my objects into custom classes for me, I created a generic TableData class, along with an extender that allows me to do this:

1
2
3
4
5
public TableData<String, DateTime, int> generateData(List<Filter<TrackedTime>> dateFilters)
{
...
return userTimes.ToTable(d => d.UserName, d => d.TargetDate, d => d.Minutes);
}
The next step would be to create a generic web control that creates a front-end table when given the returned report data. That way we can use a single control to display all kinds of report data, rather than creating a new method to display each new set of report data.