Look behind the curtain to find the truth

Sometimes a question will come up about the preferred way to write some code. There are always multiple ways to accomplish the same goals, but some ways are better than others. Of course, that depends on how you define "better": most performant, lowest resource usage, maintainability, easiest to read, etc. Once you decide what you are looking for, you need a way to compare different solutions.

In a recent code review it was suggested that it was better to declare a variable outside of a loop, if it is going to be used repeatedly within the loop. The current code had the variable declared within the loop, so that the variable is declared for every iteration. The reviewer suggested that this would create unnecessary performance or resource overhead. Others suggested that there wouldn't be any difference at all, that it was really just a question of style. Each side had their theories and explanations, but neither could convince the other definitively. Everyone understood that in this instance the performance difference, if any, would be negligible, so it wasn't really a big deal, and the review continued on to the next topic.

However, in other situations, knowing how to answer this kind of question might be important. This article will walk through how I resolved this issue, not because the answer is important, but because I think there is value in knowing how to find the answer.

The easiest way to resolve this issue is to just look at the MSIL (Microsoft Intermediate Language) code. All .NET code, no matter what language (C#, VB.NET, J#, etc) gets compiled into MSIL. The .NET Runtime doesn't know about C# or VB.NET, it only knows how to execute MSIL. MSIL is the code that is truly executed; a language like C# is just a nicer way to write the code.

For these types of tasks, I like to drop down to a command-prompt, and leave behind the dream world of Visual Studio, with its candy coated syntax highlighting, and intellisense narcotics. I want to know what is REALLY going on, without risking the chance that a tool will do some magic for me automatically.

You can use the Visual Studio Command Prompt on your start menu, or just open a command prompt and set your PATH to include the .NET Framework SDK folder (c:\Program Files\Microsoft Visual Studio .NET 2003\SDK\v1.1\bin). Open a new file in notepad:

c:\code>notepad samplelib.cs

Write the following code in Notepad. Save and close to return to the command prompt.

class Sample {
private string[] colors = new string[]{"Red","Green","Blue","Yellow","Orange"};

public void DeclareOutsideLoop(){
    string myvariable;
    for(int i=0; i<colors.Length; ++i){
      myvariable = colors[i];
      System.Console.WriteLine(myvariable);
    }
}

public void DeclareInsideLoop(){
    for(int i=0; i<colors.Length; i++){
      string myvariable = colors[i];
      System.Console.WriteLine(myvariable);
    }
}
}

Now compile the code into a DLL using the C# compilier (CSC.EXE) included on every machine with the .NET Runtime:

c:\code>csc /target:library /out:simplelib.dll simplelib.cs

You now have a compiled .NET assembly which contains a class with 2 methods that accomplish the same thing, but use slightly different C# code (one declares the temp variable within the loop, one declares it outside of the loop). You can view the IL code for the assembly (or any .NET assembly) using the IL Disassembler (ILDASM.EXE), included on every machine with the .NET SDK.

c:\code>ildasm simplelib.dll

In the ILDASM window, you will see the Sample class. If you click on the plus sign to expand the class, you will see each of the method names. If you double-click on a method name, you will see the MSIL code that makes up the method. This is the contents of DeclareOutsideLoop:

.method public hidebysig instance void DeclareOutsideLoop() cil managed
{
// Code size       35 (0x23)
.maxstack 2
.locals init (string V_0,
           int32 V_1)
IL_0000: ldc.i4.0
IL_0001: stloc.1
IL_0002: br.s       IL_0017
IL_0004: ldarg.0
IL_0005: ldfld      string[] Sample::colors
IL_000a: ldloc.1
IL_000b: ldelem.ref
IL_000c: stloc.0
IL_000d: ldloc.0
IL_000e: call       void [mscorlib]System.Console::WriteLine(string)
IL_0013: ldloc.1
IL_0014: ldc.i4.1
IL_0015: add
IL_0016: stloc.1
IL_0017: ldloc.1
IL_0018: ldarg.0
IL_0019: ldfld      string[] Sample::colors
IL_001e: ldlen
IL_001f: conv.i4
IL_0020: blt.s      IL_0004
IL_0022: ret
} // end of method Sample::DeclareOutsideLoop

And this is the contents of DeclareInsideLoop:

.method public hidebysig instance void DeclareInsideLoop() cil managed
{
// Code size       35 (0x23)
.maxstack 2
.locals init (int32 V_0,
           string V_1)
IL_0000: ldc.i4.0
IL_0001: stloc.0
IL_0002: br.s       IL_0017
IL_0004: ldarg.0
IL_0005: ldfld      string[] Sample::colors
IL_000a: ldloc.0
IL_000b: ldelem.ref
IL_000c: stloc.1
IL_000d: ldloc.1
IL_000e: call       void [mscorlib]System.Console::WriteLine(string)
IL_0013: ldloc.0
IL_0014: ldc.i4.1
IL_0015: add
IL_0016: stloc.0
IL_0017: ldloc.0
IL_0018: ldarg.0
IL_0019: ldfld      string[] Sample::colors
IL_001e: ldlen
IL_001f: conv.i4
IL_0020: blt.s      IL_0004
IL_0022: ret
} // end of method Sample::DeclareInsideLoop

The differences are highlighted in red. As you would expect, the methods are very similar. The key is in the method meta-data. Notice that they both declare maxstack 2, which means they both will only have at most 2 items on the stack. The next line shows the local variables that are used. DeclareInsideLoop defines a string, and then an int32. DeclareOutsideLoop defines an int32, and the a string. So they both use the same number and type of variables, they are just declared in a different order. It is this difference in order that accounts for the remaining differences in the code. In one method, the string (myvariable) is at stack offset 0, in the other method, it is at stack offset 1. In one method, the int32 (the iteration variable i) is at offset 1, and in the other method it is at stack offset 0.

This tells us conclusively that there will be absolutely no difference in execution behavior between the 2 methods - neither method uses more variables, object, pointers, or instructions than the other. There is no performance reason for choosing one over the other.

blog.flimflan.com

Look behind the curtain to find the truth