Skip to main content

A tip on strings

Today, let's look at a simple trick with strings in C#. It may be too obvious and repeated around the internet, but nevertheless i want to give my 2 cents on the subject. So, let's start!

Imagine you have a list of any type and now you will need this values separated by a comma in order to be processed in some stored procedure. The desired output for a list of  {1,2,3,4} is a string like this "1,2,3,4".
How will you solve this problem? I've seen and done multiple approaches. Let's order the possible solutions (in my mind) from the most complex to the simpler. First, we can do:

Notice how complex this can be, and how fast it can become messy. When replicated across an entire project (imagine the nightmare of having this replicated in your Data Access Layer whenever a stored procedure needs this input), this can be a pain. Of course, you could extract the behavior and create an extension method for it, but as we will see, it is not necessary.

Our second approach, and much simpler, uses all the goodness that Linq provides. Linq is one of those things that i really love in .NET. It can make your life easy when properly applied. But let's see how we could do it:


Still, this is somewhat not the easiest solution we could use. I mean, what's the deal with Aggregate and then Remove? This is not readable enough, you have to be familiarized with how the Aggregate method works and, as we will see, this is horribly slow.

Let's see the following snippet:

This solution is simple, maintainable, understandable at first sight and hopefully performant. Or is it? Let's run some tests! For that, i have filled the DocumentIDs list with 10.000.000 integers, extracted each approach into a method and called it 5 times wrapped in a Stopwatch. Then, i divided the result in milliseconds by 5 to obtain the average. Yes, it's not the best benchmark, but for illustration it will serve.

So, i have a Toshiba Satellite L40, Intel Pentium T2330 @ 1.6GHz and 2GB of RAM, let's see how this works out!

Foreach with StringBuilder (Snippet 1): 4773 ms on average (5 runs)

This is not bad at all! We know that you hardly going to have a this big of an output, but i like to see how the options perform under great load.

Aggregate with Remove (Snippet 2): over 1m:30s (One run only. It never stopped. I had to stop the program.)

This is not a really good approach. For a start, using Linq in this way here is like trying to kill a fly with a cannon. Notice that i'm concatenating the strings with the (+) operator too. This operation is dead slow because a string is immutable and for every iteration a new string object is allocated! So, not very efficient.

String.Join (Snippet 3): 4670 ms on average (5 runs)

So, as we can see, the third method is the fastest too. It's worth nothing that the first method is almost as fast, but let's play it simple, right? Behind the scenes, string.Join does a lot of work, but it was designed to handle the load.

If there is one lesson that i take from this kind of situations is: dont reinvent the wheel. We all do it one time or another, it's our nature as programmers, but it only matters if you learn something with it. I believe each of the approaches has it's strenghts and weaknesses. I can see myself using the foreach to encapsulate other behaviour too when the method was called, or to format the output in other more customizable ways.

I'd be happy to know if someone has other different or better approaches, so, share if you will!

Thank you for reading!

Links and references:
String.Join on MSDN
http://msdn.microsoft.com/en-us/library/57a79xd0.aspx

StringBuilder on MSDN
http://msdn.microsoft.com/en-us/library/system.text.stringbuilder.aspx

Aggregate on MSDN
http://msdn.microsoft.com/en-us/library/bb548651.aspx

Comments

Popular posts from this blog

From crappy to happy - refactoring untestable code - an introduction

I started testing my code automatically a couple of years in after starting my career. Working in a small company, there weren't really incentives for us to automate testing and we were not following any kind of best practices. Our way of working was to write the code, test it manually and then just Release It ™ , preferably on a Friday of course. I'm sure everyone can relate to this at some point in their career, because this is a lot more common than the Almighty Programming Gods of the Internet make us believe. I find that the amount of companies that actually bother writing tests for their production code is a fraction of the whole universe. I know some friends who work in pretty big companies with big names in the industry and even there the same mindset exists. Of course, at some point in time our code turned into a big pile of shit . Nobody really knew what was going on and where. We had quantum-level switcheroo that nobody really wanted to touch, and I suspect it i...

The repository's repository

Ever since I started delving into architecture,  and specifically service oriented architecture, there has been one matter where opinions get divided. Let me state the problem first, and then take a look at both sides of the barricade. Given that your service layer needs to access persistent storage, how do you model that layer? It is almost common knowledge what to do here: use the Repository design pattern. So we look at the pattern and decide that it seems simple enough! Let's implement the shit out of it! Now, let's say that you will use an ORM - here comes trouble. Specifically we're using EF, but we could be talking about NHibernate or really any other. The real divisive theme is this question: should you be using the repository pattern at all when you use an ORM? I'll flat out say it: I don't think you should... except with good reason. So, sharpen your swords, pray to your gods and come with me to fight this war... or maybe stay in the couch? ...

My simplest and most useful type

I have been doing some introspection on the way I write code to find ways that I need to improve. I consider this a task that one must do periodically so that we keep organized. There is a very, very simple problem that occurs in every application I know: How to return the results of an operation to the user? I've seen many implementations. Some return strings, some throw exceptions, some use out parameters, reuse the domain classes and have extra properties in there, etc. There is a myriad of ways of accomplishing this. This is the one I use. I don't like throwing exceptions. There are certainly cases where you have no choice, but I always avoid that. Throughout my architectures there is a single prevalent type that hasn't changed for years now, and I consider that a sign of stability. It is so simple, yet so useful everywhere. The name may shock you, take a look: Yes, this is it. Take a moment to compose yourself. Mind you, this is used everywhere , in every ...