Skip to main content

A tip on strings

Today, let's look at a simple trick with strings in C#. It may be too obvious and repeated around the internet, but nevertheless i want to give my 2 cents on the subject. So, let's start!

Imagine you have a list of any type and now you will need this values separated by a comma in order to be processed in some stored procedure. The desired output for a list of  {1,2,3,4} is a string like this "1,2,3,4".
How will you solve this problem? I've seen and done multiple approaches. Let's order the possible solutions (in my mind) from the most complex to the simpler. First, we can do:

List<int> DocumentIDs = new List<int>() {1,2,3,4,5,6,7,8,9};
StringBuilder sb = new StringBuilder();
foreach (int number in DocumentIDs)
sb.Append(",").Append(number);
sb = sb.Remove(0,1);
Console.WriteLine(sb.ToString());
view raw gistfile1.cs hosted with ❤ by GitHub
Notice how complex this can be, and how fast it can become messy. When replicated across an entire project (imagine the nightmare of having this replicated in your Data Access Layer whenever a stored procedure needs this input), this can be a pain. Of course, you could extract the behavior and create an extension method for it, but as we will see, it is not necessary.

Our second approach, and much simpler, uses all the goodness that Linq provides. Linq is one of those things that i really love in .NET. It can make your life easy when properly applied. But let's see how we could do it:
List<int> DocumentIDs = new List<int>() {1,2,3,4,5,6,7,8,9};
string s = DocumentIDs.Aggregate("", ((x, y)=> x + "," + y)).Remove(0,1);
Console.WriteLine(s);
view raw gistfile1.cs hosted with ❤ by GitHub


Still, this is somewhat not the easiest solution we could use. I mean, what's the deal with Aggregate and then Remove? This is not readable enough, you have to be familiarized with how the Aggregate method works and, as we will see, this is horribly slow.

Let's see the following snippet:
List<int> DocumentIDs = new List<int>() {1,2,3,4,5,6,7,8,9};
string MyString = string.Join(",", DocumentIDs);
Console.WriteLine (MyString);
view raw String.Join hosted with ❤ by GitHub

This solution is simple, maintainable, understandable at first sight and hopefully performant. Or is it? Let's run some tests! For that, i have filled the DocumentIDs list with 10.000.000 integers, extracted each approach into a method and called it 5 times wrapped in a Stopwatch. Then, i divided the result in milliseconds by 5 to obtain the average. Yes, it's not the best benchmark, but for illustration it will serve.

So, i have a Toshiba Satellite L40, Intel Pentium T2330 @ 1.6GHz and 2GB of RAM, let's see how this works out!

Foreach with StringBuilder (Snippet 1): 4773 ms on average (5 runs)
void Main()
{
List<int> DocumentIDs = new List<int>();
for (int i = 0; i < 10000000; i++)
DocumentIDs.Add(i);
Stopwatch sw = new Stopwatch();
sw.Start();
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
sw.Stop();
Console.WriteLine((sw.ElapsedMilliseconds / 5));
}
void Test(List<int> DocumentIDs)
{
StringBuilder sb = new StringBuilder();
foreach (var element in DocumentIDs)
{
sb.Append(",").Append(element);
}
string FinalString = sb.Remove(0,1).ToString();
}
view raw gistfile1.cs hosted with ❤ by GitHub

This is not bad at all! We know that you hardly going to have a this big of an output, but i like to see how the options perform under great load.

Aggregate with Remove (Snippet 2): over 1m:30s (One run only. It never stopped. I had to stop the program.)

void Main()
{
List<int> DocumentIDs = new List<int>();
for (int i = 0; i < 10000000; i++)
DocumentIDs.Add(i);
Stopwatch sw = new Stopwatch();
sw.Start();
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
sw.Stop();
(sw.ElapsedMilliseconds / 5).Dump();
}
void Test(List<int> DocumentIDs)
{
string s = DocumentIDs.Aggregate("", ((x, y)=> x + "," + y)).Remove(0,1);
}
view raw gistfile1.cs hosted with ❤ by GitHub
This is not a really good approach. For a start, using Linq in this way here is like trying to kill a fly with a cannon. Notice that i'm concatenating the strings with the (+) operator too. This operation is dead slow because a string is immutable and for every iteration a new string object is allocated! So, not very efficient.

String.Join (Snippet 3): 4670 ms on average (5 runs)
void Main()
{
List<int> DocumentIDs = new List<int>();
for (int i = 0; i < 10000000; i++)
DocumentIDs.Add(i);
Stopwatch sw = new Stopwatch();
sw.Start();
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
Test(DocumentIDs);
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds / 5);
}
void Test(List<int> DocumentIDs)
{
string s2 = string.Join(",", DocumentIDs);
}

So, as we can see, the third method is the fastest too. It's worth nothing that the first method is almost as fast, but let's play it simple, right? Behind the scenes, string.Join does a lot of work, but it was designed to handle the load.

If there is one lesson that i take from this kind of situations is: dont reinvent the wheel. We all do it one time or another, it's our nature as programmers, but it only matters if you learn something with it. I believe each of the approaches has it's strenghts and weaknesses. I can see myself using the foreach to encapsulate other behaviour too when the method was called, or to format the output in other more customizable ways.

I'd be happy to know if someone has other different or better approaches, so, share if you will!

Thank you for reading!

Links and references:
String.Join on MSDN
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx

StringBuilder on MSDN
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

Aggregate on MSDN
http://msdn.microsoft.com/en-us/library/bb548651.aspx

Comments

Popular posts from this blog

The repository's repository

Ever since I started delving into architecture,  and specifically service oriented architecture, there has been one matter where opinions get divided. Let me state the problem first, and then take a look at both sides of the barricade. Given that your service layer needs to access persistent storage, how do you model that layer? It is almost common knowledge what to do here: use the Repository design pattern. So we look at the pattern and decide that it seems simple enough! Let's implement the shit out of it! Now, let's say that you will use an ORM - here comes trouble. Specifically we're using EF, but we could be talking about NHibernate or really any other. The real divisive theme is this question: should you be using the repository pattern at all when you use an ORM? I'll flat out say it: I don't think you should... except with good reason. So, sharpen your swords, pray to your gods and come with me to fight this war... or maybe stay in the couch? ...

Follow up: improving the Result type from feedback

This post is a follow up on the previous post. It presents an approach on how to return values from a method. I got some great feedback both good and bad from other people, and with that I will present now the updated code taking that feedback into account. Here is the original: And the modified version: Following is some of the most important feedback which led to this. Make it an immutable struct This was a useful one. I can't say that I have ever found a problem with having the Result type as a class, but that is just a matter of scale. The point of this is that now we avoid allocating memory in high usage scenarios. This was a problem of scale, easily solvable. Return a tuple instead of using a dedicated Result type The initial implementation comes from a long time ago, when C# did not have (good) support for tuples and deconstruction wasn't heard of. You would have to deal with the Tuple type, which was a bit of a hassle. I feel it would complicate the ...

My simplest and most useful type

I have been doing some introspection on the way I write code to find ways that I need to improve. I consider this a task that one must do periodically so that we keep organized. There is a very, very simple problem that occurs in every application I know: How to return the results of an operation to the user? I've seen many implementations. Some return strings, some throw exceptions, some use out parameters, reuse the domain classes and have extra properties in there, etc. There is a myriad of ways of accomplishing this. This is the one I use. I don't like throwing exceptions. There are certainly cases where you have no choice, but I always avoid that. Throughout my architectures there is a single prevalent type that hasn't changed for years now, and I consider that a sign of stability. It is so simple, yet so useful everywhere. The name may shock you, take a look: Yes, this is it. Take a moment to compose yourself. Mind you, this is used everywhere , in every ...