Benchmarking .NET Code
Have you ever been nervous about making a change to an application, or interested in benchmarking your code in order to demonstrate the efficacy of a proposed change? I recently wanted to change the behavior of a core piece of application code, but wanted to be sure that it was going to be an improvement to the system, beyond simplifying the mental model for the code. To justify my changes, I spent time measuring them, using a tool called BenchmarkDotNet.
So there I was, diving deep into a codebase that I’ve never seen before. The
projects reference other proprietary company projects through the C# Project
File’s Reference
element (used for linking external assemblies). This is
essentially how NuGet packages are (or, were - the modern method of managing
external NuGet dependencies is to use PackageReference
)
managed in C# projects. To assist with browsing the linked libraries, I
installed what is by far my favorite Visual Studio extension,
ILSpy. ILSpy allows you to select symbols in the Visual Studio
editor, and open the symbol in the ILSpy program. This is great when working
with references where the source is not linked (such as NuGet).
I was working on a part of the source code that’s still using
LINQ-to-SQL. By browsing the (proprietary) reference assemblies
in ILSpy, I was able to surmise that a particular code path was basically
exhibiting the following behavior for something equivalent to the
AddRange<T>(IEnumerable<T>)
method (available on many collection-types in the
.NET Framework):
- Call
AddList
(or some similitude) on an interface - Iterate each member, calling
Add<T>(T)
, followed byInsertOnSubmit
Having used the Entity Framework (EF) fairly extensively in the
past, and reading a fair amount about optimizing EF, I knew this probably
wasn’t the most efficient way to manage data access. I knew that in EF, calling
the Add<T>(T value)
method repeatedly would cause the internal object tracker
to do more work than it needed to due to some internal synchronization
processes, and therefore assumed that LINQ-to-SQL (L2S) had similar
optimization opportunities, and proceded to browse the referenced libraries in
ILSpy.
The great thing about ILSpy is that it doesn’t matter if you’re looking at a .NET Framework library, or some proprietary internal implementation: the decompiled output is always more useful than some reference you found in a source code archive (with the .NET Framework, we have the Reference Source, and for internal projects, you have your source control system). I don’t want to have to setup a symbol server for debugger support, either: I just want to view what all is linked so that I can understand what’s happening when one piece of code calls into someone else’s library code. And that’s what ILSpy helps us do: view decompiled C# code for Microsoft Intermediate Language (MSIL).
Note: to be absolutely clear, I prefer decompiled code to raw source because it can be difficult to find the exact version of a library you’re trying to review. Additionally, ILSpy doesn’t give you the raw C# that was originally written: it’s just translating MSIL back to C#. This happens frequently with code written in F#.
The optimization I had in mind was pretty straight forward… I was aware that
in EF, if I called the DbSet<T>.AddRange
method, object
tracking and synchronization would only be performed one time. This wasn’t the
actual optimization, though, because that optimization should only be on the
order of a few bytes. Instead, the optimization was decreasing the time spent
configuring SQL Server connections, preparing and executing SQL, and then
resetting the connection. This is mostly related to the underlying Linq-to-SQL
provider implementation, and the .NET Framework SqlClient Connection Pool. For
more details, I highly suggest reading the documentation, available under
SQL Server Connection Pooling (ADO.NET). The changes I made
were straight forward, changing the execution model from:
- Add entity
- Submit changes
To:
- Add entities
- Submit all changes
The Submit Changes code is similar to this:
- Acquire a connection instance
- Iterate all inserts (“adds”) across all “tables” (entity mappings)
- For each object in each table, prepare an SQL statement
- Execute the SQL statement(s)
- Reset the connection
It’s all a little more complex than that under the covers, and the action list may not be performed precisely in this order, but that is the gist of what’s going on. The reason that my changes improved aggregated throughput is because the calling code spent a lot less time iterating through the Submit Changes process by submitting all of them at once. But I didn’t want to spend any time measuring that myself. I’m not an expert at benchmarking, and had seen several gists from the PFX Team demonstrating benchmarks they’d done on .NET (across several runtimes, including the desktop CLR, core CLR, and Mono). They were using BenchmarkDotNet (benchmark) in all the things I’d seen, so it seemed like a natural progression for me to grab the same tool and use it myself.
The changes I was making were conveniently hidden behind an interface that looked similar to this:
public interface IResourceWriter<T> {
void Add(T value);
void Add(IEnumerable<T> values);
}
A benchmark program is pretty straight forward. It abstracts away all the things about timing code, figuring out how many iterations to perform, and other details that can be distracting from your actual intent. Instead of writing iterative code to perform these things, a benchmark program leverages a more declarative-programming model, powered by .NET Attributes. The attributes allow you to specify things like how many iterations you want to be performed (if you want to override the defaults), which CLR you want to target, and even if you want to parameterize the data in your program.
Let’s assume that we have two implementations of IResourceWriter<T>
. One of
them is simply named ResourceWriter<T>
, and the one that I wanted to compare
against was called FastResourceWriter<T>
. To start, let’s pretend we have an
entity that is mapped to a table similar to this:
public class Foo {
// SQL Server Identity Column
public int FooId { get; set; }
public string FooName { get; set; }
public DateTime CreatedTimestamp { get; set; }
}
The benchmark program I wrote looked similar to this:
[ClrJob, DryClrJob]
public class ResourceWriterBenchmark {
private readonly IResourceWriter<Foo> _baselineResourceWriter;
private readonly IResourceWriter<Foo> _fastResourceWriter;
private readonly IEnumerable<Foo> _entities;
public ResourceWriterBenchmark()
{
_baselineResourceWriter = new ResourceWriter<Foo>();
_fastResourceWriter = new FastResourceWriter<Foo>();
_entities = GenerateEntities();
}
[Params(100, 1000, 2000, 4000, 8000, 16000)]
public int NumberOfEntitiesToInsert { get; set; }
[Benchmark(Baseline = true)]
public void BaselineAddMultipleEntities() =>
GenericAddCore(_baselineResourceWriter);
[Benchmark()]
public void FastAddMultipleEntities() =>
GenericAddCore(_fastResourceWriter);
private void GenericAddCore(IResourceWriter<T> resourceWriter)
{
var entities = _entities.Take(NumberOfEntitiesToInsert);
resourceWriter.Add(entities);
}
private static IEnumerable<Foo> GenerateEntities()
{
for (var i = 0; i < 16000; i++)
{
yield return new Foo()
{
FooId = i,
FooName = string.Format("Foo#{0}", i),
CreatedTimestamp = DateTime.Now
};
}
}
}
This example shows how easy it is to write a benchmark. I was able to cook this
up in a fraction of the time it would have taken to do the same thing myself.
I ran this just following the documentation, and had results in a few minutes.
The benchmark results proved, unequivocally, that the FastResourceWriter<T>
was on average ~70% faster than the baseline ResourceWriter<T>
. Better still,
this seemed to scale with increases in the size of the input, which made me
very confident that it would be successful in production.
Wrapping Up
As a developer, I try very hard to use an evidence-based approach for all changes I make. Sometimes, this means diving deep into the internals of how a particular dependency I use works. Other times, it means exploring something new, such as BenchmarkDotNet. In the best circumstances, I also employ unit tests. One of the things I try to ask myself, before I write any line of code, is, “how repeatable will this be for other developers in the future?” Sure, I can just crank out a program that does some simple timer-based analysis, write it to a CSV file (or something), and whatever else, but that does nothing to:
- Support other developers that are trying to duplicate my results
- Increase my productivity for the task I’m working on
- Increase the productivity of any future developer touching the codebase in the future
That last point is the most important, in my opinion. The code we write to test and exercise our actual business code is at least as important as the business code itself. The business code is the thing that creates value for our customers. Developers and maintainers are the customers of things like tests and benchmarks, and it’s worth considering what the experience of a future developer is going to look like when they’re approaching this suite of code.
As always, I appreciate you taking the time to read my ramblings, and hope they provide value to you in whatever it is you’re doing. Next time, we’ll look at a nifty little parsing tool that will improve your experience with building command-line tools.
Get out and do some benchmarks!
- Brian