Skip to main content

A tip on strings

Today, let's look at a simple trick with strings in C#. It may be too obvious and repeated around the internet, but nevertheless i want to give my 2 cents on the subject. So, let's start!

Imagine you have a list of any type and now you will need this values separated by a comma in order to be processed in some stored procedure. The desired output for a list of  {1,2,3,4} is a string like this "1,2,3,4".
How will you solve this problem? I've seen and done multiple approaches. Let's order the possible solutions (in my mind) from the most complex to the simpler. First, we can do:

Notice how complex this can be, and how fast it can become messy. When replicated across an entire project (imagine the nightmare of having this replicated in your Data Access Layer whenever a stored procedure needs this input), this can be a pain. Of course, you could extract the behavior and create an extension method for it, but as we will see, it is not necessary.

Our second approach, and much simpler, uses all the goodness that Linq provides. Linq is one of those things that i really love in .NET. It can make your life easy when properly applied. But let's see how we could do it:


Still, this is somewhat not the easiest solution we could use. I mean, what's the deal with Aggregate and then Remove? This is not readable enough, you have to be familiarized with how the Aggregate method works and, as we will see, this is horribly slow.

Let's see the following snippet:

This solution is simple, maintainable, understandable at first sight and hopefully performant. Or is it? Let's run some tests! For that, i have filled the DocumentIDs list with 10.000.000 integers, extracted each approach into a method and called it 5 times wrapped in a Stopwatch. Then, i divided the result in milliseconds by 5 to obtain the average. Yes, it's not the best benchmark, but for illustration it will serve.

So, i have a Toshiba Satellite L40, Intel Pentium T2330 @ 1.6GHz and 2GB of RAM, let's see how this works out!

Foreach with StringBuilder (Snippet 1): 4773 ms on average (5 runs)

This is not bad at all! We know that you hardly going to have a this big of an output, but i like to see how the options perform under great load.

Aggregate with Remove (Snippet 2): over 1m:30s (One run only. It never stopped. I had to stop the program.)

This is not a really good approach. For a start, using Linq in this way here is like trying to kill a fly with a cannon. Notice that i'm concatenating the strings with the (+) operator too. This operation is dead slow because a string is immutable and for every iteration a new string object is allocated! So, not very efficient.

String.Join (Snippet 3): 4670 ms on average (5 runs)

So, as we can see, the third method is the fastest too. It's worth nothing that the first method is almost as fast, but let's play it simple, right? Behind the scenes, string.Join does a lot of work, but it was designed to handle the load.

If there is one lesson that i take from this kind of situations is: dont reinvent the wheel. We all do it one time or another, it's our nature as programmers, but it only matters if you learn something with it. I believe each of the approaches has it's strenghts and weaknesses. I can see myself using the foreach to encapsulate other behaviour too when the method was called, or to format the output in other more customizable ways.

I'd be happy to know if someone has other different or better approaches, so, share if you will!

Thank you for reading!

Links and references:
String.Join on MSDN
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx

StringBuilder on MSDN
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

Aggregate on MSDN
http://msdn.microsoft.com/en-us/library/bb548651.aspx

Comments

Popular posts from this blog

The repository's repository

Ever since I started delving into architecture,  and specifically service oriented architecture, there has been one matter where opinions get divided. Let me state the problem first, and then take a look at both sides of the barricade. Given that your service layer needs to access persistent storage, how do you model that layer? It is almost common knowledge what to do here: use the Repository design pattern. So we look at the pattern and decide that it seems simple enough! Let's implement the shit out of it! Now, let's say that you will use an ORM - here comes trouble. Specifically we're using EF, but we could be talking about NHibernate or really any other. The real divisive theme is this question: should you be using the repository pattern at all when you use an ORM? I'll flat out say it: I don't think you should... except with good reason. So, sharpen your swords, pray to your gods and come with me to fight this war... or maybe stay in the couch?

The evolution of C# - Part III - C# 2.0 - Iterators

It's been a while since i wrote the last post, but i did not forget my purpose of creating a series that shows the evolution of C#. Today i came here to talk about one of the most useful features of C#, even if you dont know you're using it. Let's talk about iterators ! What is an iterator? For those of you who didn't read about the iterator pattern somewhere in the internet or in the "Gang of Four" book, you can read a description  here . The iterator is a class/object/whatever which knows how to traverse a structure. So, if you have a list or collection of objects, an iterator would have the knowledge of how to traverse that collection and access each element that it contains. The iterator is a well known design pattern and is behind many of the wonderful that we have nowadays in .NET (Linq comes to mind). Why is it a feature? Truth be told, an iterator is a concept well known way before .NET even existed. Being an OO Design Pattern, the iterator has

My simplest and most useful type

I have been doing some introspection on the way I write code to find ways that I need to improve. I consider this a task that one must do periodically so that we keep organized. There is a very, very simple problem that occurs in every application I know: How to return the results of an operation to the user? I've seen many implementations. Some return strings, some throw exceptions, some use out parameters, reuse the domain classes and have extra properties in there, etc. There is a myriad of ways of accomplishing this. This is the one I use. I don't like throwing exceptions. There are certainly cases where you have no choice, but I always avoid that. Throughout my architectures there is a single prevalent type that hasn't changed for years now, and I consider that a sign of stability. It is so simple, yet so useful everywhere. The name may shock you, take a look: Yes, this is it. Take a moment to compose yourself. Mind you, this is used everywhere , in every