Skip to main content

A tip on strings

Today, let's look at a simple trick with strings in C#. It may be too obvious and repeated around the internet, but nevertheless i want to give my 2 cents on the subject. So, let's start!

Imagine you have a list of any type and now you will need this values separated by a comma in order to be processed in some stored procedure. The desired output for a list of  {1,2,3,4} is a string like this "1,2,3,4".
How will you solve this problem? I've seen and done multiple approaches. Let's order the possible solutions (in my mind) from the most complex to the simpler. First, we can do:

Notice how complex this can be, and how fast it can become messy. When replicated across an entire project (imagine the nightmare of having this replicated in your Data Access Layer whenever a stored procedure needs this input), this can be a pain. Of course, you could extract the behavior and create an extension method for it, but as we will see, it is not necessary.

Our second approach, and much simpler, uses all the goodness that Linq provides. Linq is one of those things that i really love in .NET. It can make your life easy when properly applied. But let's see how we could do it:


Still, this is somewhat not the easiest solution we could use. I mean, what's the deal with Aggregate and then Remove? This is not readable enough, you have to be familiarized with how the Aggregate method works and, as we will see, this is horribly slow.

Let's see the following snippet:

This solution is simple, maintainable, understandable at first sight and hopefully performant. Or is it? Let's run some tests! For that, i have filled the DocumentIDs list with 10.000.000 integers, extracted each approach into a method and called it 5 times wrapped in a Stopwatch. Then, i divided the result in milliseconds by 5 to obtain the average. Yes, it's not the best benchmark, but for illustration it will serve.

So, i have a Toshiba Satellite L40, Intel Pentium T2330 @ 1.6GHz and 2GB of RAM, let's see how this works out!

Foreach with StringBuilder (Snippet 1): 4773 ms on average (5 runs)

This is not bad at all! We know that you hardly going to have a this big of an output, but i like to see how the options perform under great load.

Aggregate with Remove (Snippet 2): over 1m:30s (One run only. It never stopped. I had to stop the program.)

This is not a really good approach. For a start, using Linq in this way here is like trying to kill a fly with a cannon. Notice that i'm concatenating the strings with the (+) operator too. This operation is dead slow because a string is immutable and for every iteration a new string object is allocated! So, not very efficient.

String.Join (Snippet 3): 4670 ms on average (5 runs)

So, as we can see, the third method is the fastest too. It's worth nothing that the first method is almost as fast, but let's play it simple, right? Behind the scenes, string.Join does a lot of work, but it was designed to handle the load.

If there is one lesson that i take from this kind of situations is: dont reinvent the wheel. We all do it one time or another, it's our nature as programmers, but it only matters if you learn something with it. I believe each of the approaches has it's strenghts and weaknesses. I can see myself using the foreach to encapsulate other behaviour too when the method was called, or to format the output in other more customizable ways.

I'd be happy to know if someone has other different or better approaches, so, share if you will!

Thank you for reading!

Links and references:
String.Join on MSDN
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx

StringBuilder on MSDN
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

Aggregate on MSDN
http://msdn.microsoft.com/en-us/library/bb548651.aspx

Comments

Popular posts from this blog

The repository's repository

Ever since I started delving into architecture,  and specifically service oriented architecture, there has been one matter where opinions get divided. Let me state the problem first, and then take a look at both sides of the barricade. Given that your service layer needs to access persistent storage, how do you model that layer? It is almost common knowledge what to do here: use the Repository design pattern. So we look at the pattern and decide that it seems simple enough! Let's implement the shit out of it! Now, let's say that you will use an ORM - here comes trouble. Specifically we're using EF, but we could be talking about NHibernate or really any other. The real divisive theme is this question: should you be using the repository pattern at all when you use an ORM? I'll flat out say it: I don't think you should... except with good reason. So, sharpen your swords, pray to your gods and come with me to fight this war... or maybe stay in the couch? ...

Follow up: improving the Result type from feedback

This post is a follow up on the previous post. It presents an approach on how to return values from a method. I got some great feedback both good and bad from other people, and with that I will present now the updated code taking that feedback into account. Here is the original: And the modified version: Following is some of the most important feedback which led to this. Make it an immutable struct This was a useful one. I can't say that I have ever found a problem with having the Result type as a class, but that is just a matter of scale. The point of this is that now we avoid allocating memory in high usage scenarios. This was a problem of scale, easily solvable. Return a tuple instead of using a dedicated Result type The initial implementation comes from a long time ago, when C# did not have (good) support for tuples and deconstruction wasn't heard of. You would have to deal with the Tuple type, which was a bit of a hassle. I feel it would complicate the ...

C# 2.0 - Partial Types

For those of you interested, i found a very interesting list of features that were introduced in C# in  here . This is a very complete list that contains all the features, and i'm explaining them one by one in this post series. We've talked about  Generics  and  Iterators . Now it's time for some partial types . A partial type  is a type which definition is spread across one or more files. It doesn't have to be in multiple separated files, but can be. This is a very simple concept that can give us many benefits, let's see: If a type is partial, multiple developers can work on every part of it. This allows a more organized way of working and can lead to production improvement.  Winforms , for example, generates a partial class for the form so that the client can separately edit other parts it. This way, a part contains information about the design and the other contains the logic of the form. In fact, this is a very spread pattern across .Net. Ent...