The power of yield (return)
Yield is one of the coolest, yet underused, new .NET 2.0 features that you may not be familiar with. Yes, I'm talking about a 2.0 feature - even though 3.5 just came out, it is my experience that a large number of developers have still not embraced all that 2.0 gave us. Hopefully this article will help make this powerful, yet initially confusing, feature more approachable.
Some background facts
When you "foreach" over an object (array, collection, etc), it must implement IEnumerable or IEnumerable<T> - the generic version which returns strongly typed objects (for the rest of this article, I'll just refer to IEnumerable).
All arrays, and most collections (ArrayList, List<T>) implement IEnumerable, which is why you can "foreach" over them.
If ALL you need to do to a collection is cycle through its members - you only need to reference it as an IEnumerable. That explains why the following code is perfectly valid:
IEnumerable<int> numbers = new int[] { 1, 2, 3 }; foreach (int i in numbers) { Console.WriteLine(i); }
It is always better to depend on abstract types (interfaces, abstract base classes) rather than concrete types. This allows you to change the implementation of a method without impacting any of its callers.
Consider this example where CallingMethod gets a set of numbers from MethodBeingCalled. It really doesn't care if it gets an array or a List or whatever. It just does a foreach over it, which means it only needs the functionality exposed by the IEnumerable interface.
public void CallingMethod() { List<int> numbers = MethodBeingCalled(); foreach (int i in numbers) { Console.WriteLine(i); } } private List<int> MethodBeingCalled() { List<int> numbers = new List<int>(); numbers.AddRange(new int[] { 1, 2, 3 }); return numbers; }
The problem with the current design is that the CallingMethod declares the list of numbers List<int>. That is a specific implementation detail. If CallingMethod needed to add more numbers to the list after it was returned from MethodBeingCalled, or if individual numbers were being accessed by their index, then it would make sense to declare it as a List<T> (or better, IList<T>). But it really only needs to cycle through the numbers in order. If MethodBeingCalled wanted to change its implementation to return an int[] instead, it would have to change its signature, and CallingMethod would also have to change. However, if CallingMethod had just declared its local variable as IEnumerable<int> (because that is the only functionality it truly depends on), then it would not have to change if MethodBeingCalled changes - as long as it still returns something that implements IEnumerable<int>.
So what does foreach do? It asks the IEnumerable if it has any items. If so, it gets the next item and uses it within the loop. It then asks the IEnumerable again if it has more items, gets the next one, and so on, until there are no more items. The important thing to note here is that foreach never works with the entire collection all it once - it only deals with one item at a time.
Introducing yield
How can we use all of this information? If you have a method that returns a collection of data, and you can be sure that callers are only going to be interested in looping over each item in the collection, one at a time (which is very common), you can declare your method's return value as IEnumerable<T>.
When you declare your method's return value as IEnumerable<T>, something magic happens: the C# compiler will now let you use the yield keyword within the body! So why does that matter? Because it allows you to take advantage of the way that foreach really works. Remember, foreach will just ask for one value at a time. When it does, it will call into your method - your method will run until it encounters a yield return statement. When it reaches a yield return statement, that value is immediately returned to the caller. In the next iteration of the caller's loop, it will ask again if there is another value, your method will start executing again, but this time it will start on the line after the last yield statement that was run. Yes, you read that right: execution alternates between the calling method and the method being called, all the way through the loop. This is completely different from the normal way that methods interact, where one method gives up control to another method, which doesn't return until it is completely finished executing.
This has at least 2 very important implications for the method that returns the IEnumerable<T>:
1) It does not need to build up a big list of all of the items it plans to return, holding them all in memory at the same time. If the list of items is very large, holding them all in memory at once can be a significant burden.
2) It does not necessarily need to do the work to produce all of the items in the list. If the caller breaks out of the foreach loop before the entire collection has been traversed - all of the work needed to produce the remaining items in the colleciton is avoided. Without using the yield keyword, you would have to build up the entire collection to return, even if the caller was only going to look at the first few items.
Demonstration
To illustrate the advantage of this approach, I included sample code at the bottom. You can create a new Console Application and paste this over the default Program.cs.
A method, GetCombinations returns a collection of 2 numbers combined. The caller then loops through the collection, looking for a specific combination: "13".
First, run it using GetCombinations. This is implemented the way most people would write a method that returns a collection, in .NET 1.1. The output willl look something like this (the indented output is from within GetCombinations):
BEFORE call to GetCombinations
Beginning execution of GetCombinations
Producing 33
Producing 23
Producing 13
Producing 32
Producing 22
Producing 12
Producing 31
Producing 21
Producing 11
Ending execution of GetCombinations
AFTER call to GetCombinations
Checking 33
Checking 23
Checking 13
Found!
Elapsed: 00:00:08.9931583
Now, comment out the line that calls GetCombinations, and uncomment the line that calls BetterGetCombinations. It is important to note that BetterGetCombinations uses the exact same algorithm as GetCombinations - given the same input, they will both return the exact same list of strings, in the same order (I've emphasized the lines that changed, in the sample code below). But since the caller is only looking for a specific combination, not all combinations need to be produced. Consider the output:
BEFORE call to GetCombinations
AFTER call to GetCombinations
Beginning execution of BetterGetCombinations <-- notice that BetterGetCombinations doesn't really start until the foreach loop - NOT when the method was first called!
Producing 33
Checking 33
Producing 23
Checking 23
Producing 13
Checking 13
Found!
Elapsed: 00:00:02.9976438
With the improved implementation that took advantage of the yield keyword, the program was able to finish its job in less than half the time! It also used much less memory, as it never had to store all 9 strings in a collection. Now imagine the potential impact if GetCombinations returned a collection with thousands of entries!
Sample Code
Paste the code below over the default Program.cs in a new Console Application:
using System; using System.Collections.Generic; using System.Diagnostics; using System.Threading; namespace YieldDemo { class Program { static void Main() { Stopwatch timer = Stopwatch.StartNew(); Console.WriteLine("BEFORE call to GetCombinations"); //1) First try running using the GetCombinations that returns a List<string> List<string> combinations = GetCombinations(3, 3); //2) After observing the output, comment out the above line, uncomment the next line, and re-run // IEnumerable<string> combinations = BetterGetCombinations(3, 3); Console.WriteLine("AFTER call to GetCombinations"); foreach (string combination in combinations) { Console.WriteLine("Checking " + combination); if (combination == "13") { Console.WriteLine("Found!"); break; } } timer.Stop(); Console.WriteLine("Elapsed: " + timer.Elapsed); } static List<string> GetCombinations(int left, int right) { Console.WriteLine("\tBeginning execution of GetCombinations"); List<string> output = new List<string>(); int currentLeft; int currentRight = right; while (currentRight > 0) { currentLeft = left; while (currentLeft > 0) { string newCombinationToProduce = currentLeft.ToString() + currentRight; Console.WriteLine("\tProducing " + newCombinationToProduce); Thread.Sleep(1000); // simulate expensive process to produce a combination output.Add(newCombinationToProduce); --currentLeft; } --currentRight; } Console.WriteLine("\tEnding execution of GetCombinations"); return output; } static IEnumerable<string> BetterGetCombinations(int left, int right) { Console.WriteLine("\tBeginning execution of BetterGetCombinations"); // NOTE The local List is not necessary in this implementation int currentLeft; int currentRight = right; while (currentRight > 0) { currentLeft = left; while (currentLeft > 0) { string newCombinationToProduce = currentLeft.ToString() + currentRight; Console.WriteLine("\tProducing " + newCombinationToProduce); Thread.Sleep(1000); // simulate expensive process to produce a combination yield return newCombinationToProduce; // insteading of adding to a list, just return the value --currentLeft; } --currentRight; } Console.WriteLine("\tEnding execution of BetterGetCombinations"); } } }
Comments
@Steve - Thanks, that's exactly the kind of feedback I look for.
Since we are on fun facts...using foreach actually leaves behind an object that needs to be gc'ed. So, that is why foreach is not used in game programming (for loops are "usually" better and leave nothing to be cleaned up). Something I learned when I was dabbling with the XNA framework for the XBOX 360.
The performance gain using yield is a compelling reason for me to explore it. I'm going to copy that source code and run it through myself now.
Thanks.
It's not better to return an (IList<t>), it's better to accept IList<T> as a return parameter. That way you can return any object that supports the IList<T> interface rather than ONLY List<T>. The benefit is in code maintenance so that if you want to refactor something, you may not have to alter all of its dependencies.
Rob
If you put cleanup code in a finally block it should work (at least it made "Ending execution of BetterGetCombinations" print), though you will encounter extra overhead.
Ps Yield also works with the non-generic version of IEnumerable in case you're returning a collection that doesn't necessarily need to be locked down, because it may return multiple shapes to be consumed by a templated control like a gridview
The yield thingy is nothing new to Python and Ruby programmers. Similarly, a familiarity with list comprehensions (Haskell, Python) would stand anyone starting out with LINQ in good stead.
Obviously anybody with an ear to the ground would also be well-advised to poke their noses into Don Syme's F# book this year.
For some good examples of yield goodness, I suggest googling for Python or Ruby combinatoric generators - e.g. k-subsets. For a C# version, see here: http://www.interact-sw.co.uk/iangblog/2004/09/16/permuterate
The object that is, sometimes, created is nothing to do with using foreach. It is simply the enumerator returned by GetEnumerator(). Many implementations return a new enumerator everytime GetEnumerator() is called, and often the enumerator is a class/reference type. Some implementations, including parts of the XNA Framework, use struct/value types for their enumerators as this avoids creating a new object on the heap.
What advantages does this have over standard for loops? With for loops you can specify more conditions to stop the loop for this reason we tend not to use the foreach statment. How would a standard for loop compare to use of the yield keyword?
Of course, you could still use the yield magic and still use a for loop in Main instead of foreach, if you wanted to. It would look something like (replacing the foreach line in Main):
for (IEnumerator<string> enumerator = combinations.GetEnumerator(); enumerator.MoveNext();) {
string combination = (string) enumerator.Current;
//... continue the rest of the loop
Good post. Perhaps you should of also, went into Reflector and pasted what the magic underlying code that's generated, which from what I recall is a state machine with lots of labels and goto's involved to achieve the yield behavior.
Just my 2 cents.
Once again, good post!
Winston
Best way to do this is to create a DBConnection class (or something like that) that handles your database connections, and make it implement IDisposable. In the dispose method of your DBConnection class you can put in any cleanup you need. Now you just use:
using( DBConnection conn = new DBConnection())
{
// Code that uses conn.
}
at the end of the using block, or exit of the scope (errors etc), the dispose method is called which is where you destroy the database connection. fun :)
http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx