A tip on strings

Today, let's look at a simple trick with strings in C#. It may be too obvious and repeated around the internet, but nevertheless i want to give my 2 cents on the subject. So, let's start!

Imagine you have a list of any type and now you will need this values separated by a comma in order to be processed in some stored procedure. The desired output for a list of {1,2,3,4} is a string like this "1,2,3,4".
How will you solve this problem? I've seen and done multiple approaches. Let's order the possible solutions (in my mind) from the most complex to the simpler. First, we can do:

Notice how complex this can be, and how fast it can become messy. When replicated across an entire project (imagine the nightmare of having this replicated in your Data Access Layer whenever a stored procedure needs this input), this can be a pain. Of course, you could extract the behavior and create an extension method for it, but as we will see, it is not necessary.

Our second approach, and much simpler, uses all the goodness that Linq provides. Linq is one of those things that i really love in .NET. It can make your life easy when properly applied. But let's see how we could do it:

Still, this is somewhat not the easiest solution we could use. I mean, what's the deal with Aggregate and then Remove? This is not readable enough, you have to be familiarized with how the Aggregate method works and, as we will see, this is horribly slow.

Let's see the following snippet:

This solution is simple, maintainable, understandable at first sight and hopefully performant. Or is it? Let's run some tests! For that, i have filled the DocumentIDs list with 10.000.000 integers, extracted each approach into a method and called it 5 times wrapped in a Stopwatch. Then, i divided the result in milliseconds by 5 to obtain the average. Yes, it's not the best benchmark, but for illustration it will serve.

So, i have a Toshiba Satellite L40, Intel Pentium T2330 @ 1.6GHz and 2GB of RAM, let's see how this works out!

Foreach with StringBuilder (Snippet 1): 4773 ms on average (5 runs)

This is not bad at all! We know that you hardly going to have a this big of an output, but i like to see how the options perform under great load.

Aggregate with Remove (Snippet 2): over 1m:30s (One run only. It never stopped. I had to stop the program.)

This is not a really good approach. For a start, using Linq in this way here is like trying to kill a fly with a cannon. Notice that i'm concatenating the strings with the (+) operator too. This operation is dead slow because a string is immutable and for every iteration a new string object is allocated! So, not very efficient.

String.Join (Snippet 3): 4670 ms on average (5 runs)

So, as we can see, the third method is the fastest too. It's worth nothing that the first method is almost as fast, but let's play it simple, right? Behind the scenes, string.Join does a lot of work, but it was designed to handle the load.

If there is one lesson that i take from this kind of situations is: dont reinvent the wheel. We all do it one time or another, it's our nature as programmers, but it only matters if you learn something with it. I believe each of the approaches has it's strenghts and weaknesses. I can see myself using the foreach to encapsulate other behaviour too when the method was called, or to format the output in other more customizable ways.

I'd be happy to know if someone has other different or better approaches, so, share if you will!

Thank you for reading!

Links and references:
String.Join on MSDN
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx

StringBuilder on MSDN
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

Aggregate on MSDN
http://msdn.microsoft.com/en-us/library/bb548651.aspx

Left fold

Search This Blog

A tip on strings

Labels

Comments

Post a Comment

Popular posts from this blog

From crappy to happy - refactoring untestable code - an introduction

Why is the Single Responsability Principle important?

My simplest and most useful type