Benchmarking .NET Code

Have you ever been nervous about making a change to an application, or interested in benchmarking your code in order to demonstrate the efficacy of a proposed change? I recently wanted to change the behavior of a core piece of application code, but wanted to be sure that it was going to be an improvement to the system, beyond simplifying the mental model for the code. To justify my changes, I spent time measuring them, using a tool called BenchmarkDotNet.

So there I was, diving deep into a codebase that I’ve never seen before. The projects reference other proprietary company projects through the C# Project File’s Reference element (used for linking external assemblies). This is essentially how NuGet packages are (or, were - the modern method of managing external NuGet dependencies is to use PackageReference) managed in C# projects. To assist with browsing the linked libraries, I installed what is by far my favorite Visual Studio extension, ILSpy. ILSpy allows you to select symbols in the Visual Studio editor, and open the symbol in the ILSpy program. This is great when working with references where the source is not linked (such as NuGet).

I was working on a part of the source code that’s still using LINQ-to-SQL. By browsing the (proprietary) reference assemblies in ILSpy, I was able to surmise that a particular code path was basically exhibiting the following behavior for something equivalent to the AddRange<T>(IEnumerable<T>) method (available on many collection-types in the .NET Framework):

Call AddList (or some similitude) on an interface
Iterate each member, calling Add<T>(T), followed by InsertOnSubmit

Having used the Entity Framework (EF) fairly extensively in the past, and reading a fair amount about optimizing EF, I knew this probably wasn’t the most efficient way to manage data access. I knew that in EF, calling the Add<T>(T value) method repeatedly would cause the internal object tracker to do more work than it needed to due to some internal synchronization processes, and therefore assumed that LINQ-to-SQL (L2S) had similar optimization opportunities, and proceded to browse the referenced libraries in ILSpy.

The great thing about ILSpy is that it doesn’t matter if you’re looking at a .NET Framework library, or some proprietary internal implementation: the decompiled output is always more useful than some reference you found in a source code archive (with the .NET Framework, we have the Reference Source, and for internal projects, you have your source control system). I don’t want to have to setup a symbol server for debugger support, either: I just want to view what all is linked so that I can understand what’s happening when one piece of code calls into someone else’s library code. And that’s what ILSpy helps us do: view decompiled C# code for Microsoft Intermediate Language (MSIL).

Note: to be absolutely clear, I prefer decompiled code to raw source because it can be difficult to find the exact version of a library you’re trying to review. Additionally, ILSpy doesn’t give you the raw C# that was originally written: it’s just translating MSIL back to C#. This happens frequently with code written in F#.

The optimization I had in mind was pretty straight forward… I was aware that in EF, if I called the DbSet<T>.AddRange method, object tracking and synchronization would only be performed one time. This wasn’t the actual optimization, though, because that optimization should only be on the order of a few bytes. Instead, the optimization was decreasing the time spent configuring SQL Server connections, preparing and executing SQL, and then resetting the connection. This is mostly related to the underlying Linq-to-SQL provider implementation, and the .NET Framework SqlClient Connection Pool. For more details, I highly suggest reading the documentation, available under SQL Server Connection Pooling (ADO.NET). The changes I made were straight forward, changing the execution model from:

Add entity
Submit changes

To:

Add entities
Submit all changes

The Submit Changes code is similar to this:

Acquire a connection instance
Iterate all inserts (“adds”) across all “tables” (entity mappings)
For each object in each table, prepare an SQL statement
Execute the SQL statement(s)
Reset the connection

It’s all a little more complex than that under the covers, and the action list may not be performed precisely in this order, but that is the gist of what’s going on. The reason that my changes improved aggregated throughput is because the calling code spent a lot less time iterating through the Submit Changes process by submitting all of them at once. But I didn’t want to spend any time measuring that myself. I’m not an expert at benchmarking, and had seen several gists from the PFX Team demonstrating benchmarks they’d done on .NET (across several runtimes, including the desktop CLR, core CLR, and Mono). They were using BenchmarkDotNet (benchmark) in all the things I’d seen, so it seemed like a natural progression for me to grab the same tool and use it myself.

The changes I was making were conveniently hidden behind an interface that looked similar to this:

public interface IResourceWriter<T> {
    void Add(T value);

    void Add(IEnumerable<T> values);
}

A benchmark program is pretty straight forward. It abstracts away all the things about timing code, figuring out how many iterations to perform, and other details that can be distracting from your actual intent. Instead of writing iterative code to perform these things, a benchmark program leverages a more declarative-programming model, powered by .NET Attributes. The attributes allow you to specify things like how many iterations you want to be performed (if you want to override the defaults), which CLR you want to target, and even if you want to parameterize the data in your program.

Let’s assume that we have two implementations of IResourceWriter<T>. One of them is simply named ResourceWriter<T>, and the one that I wanted to compare against was called FastResourceWriter<T>. To start, let’s pretend we have an entity that is mapped to a table similar to this:

public class Foo {
    // SQL Server Identity Column
    public int FooId { get; set; }

    public string FooName { get; set; }

    public DateTime CreatedTimestamp { get; set; }
}

The benchmark program I wrote looked similar to this:

[ClrJob, DryClrJob]
public class ResourceWriterBenchmark {
    private readonly IResourceWriter<Foo> _baselineResourceWriter;
    private readonly IResourceWriter<Foo> _fastResourceWriter;
    private readonly IEnumerable<Foo> _entities;

    public ResourceWriterBenchmark()
    {
        _baselineResourceWriter = new ResourceWriter<Foo>();
        _fastResourceWriter = new FastResourceWriter<Foo>();
        _entities = GenerateEntities();
    }

    [Params(100, 1000, 2000, 4000, 8000, 16000)]
    public int NumberOfEntitiesToInsert { get; set; }

    [Benchmark(Baseline = true)]
    public void BaselineAddMultipleEntities() =>
        GenericAddCore(_baselineResourceWriter);

    [Benchmark()]
    public void FastAddMultipleEntities() =>
        GenericAddCore(_fastResourceWriter);

    private void GenericAddCore(IResourceWriter<T> resourceWriter)
    {
        var entities = _entities.Take(NumberOfEntitiesToInsert);
        resourceWriter.Add(entities);
    }

    private static IEnumerable<Foo> GenerateEntities()
    {
        for (var i = 0; i < 16000; i++)
        {
            yield return new Foo()
            {
                FooId = i,
                FooName = string.Format("Foo#{0}", i),
                CreatedTimestamp = DateTime.Now
            };
        }
    }
}

This example shows how easy it is to write a benchmark. I was able to cook this up in a fraction of the time it would have taken to do the same thing myself. I ran this just following the documentation, and had results in a few minutes. The benchmark results proved, unequivocally, that the FastResourceWriter<T> was on average ~70% faster than the baseline ResourceWriter<T>. Better still, this seemed to scale with increases in the size of the input, which made me very confident that it would be successful in production.

Wrapping Up

As a developer, I try very hard to use an evidence-based approach for all changes I make. Sometimes, this means diving deep into the internals of how a particular dependency I use works. Other times, it means exploring something new, such as BenchmarkDotNet. In the best circumstances, I also employ unit tests. One of the things I try to ask myself, before I write any line of code, is, “how repeatable will this be for other developers in the future?” Sure, I can just crank out a program that does some simple timer-based analysis, write it to a CSV file (or something), and whatever else, but that does nothing to:

Support other developers that are trying to duplicate my results
Increase my productivity for the task I’m working on
Increase the productivity of any future developer touching the codebase in the future

That last point is the most important, in my opinion. The code we write to test and exercise our actual business code is at least as important as the business code itself. The business code is the thing that creates value for our customers. Developers and maintainers are the customers of things like tests and benchmarks, and it’s worth considering what the experience of a future developer is going to look like when they’re approaching this suite of code.

As always, I appreciate you taking the time to read my ramblings, and hope they provide value to you in whatever it is you’re doing. Next time, we’ll look at a nifty little parsing tool that will improve your experience with building command-line tools.

Get out and do some benchmarks!

- Brian