Article Index

The power of yield (return)

Yield is one of the coolest, yet underused, new .NET 2.0 features that you may not be familiar with. Yes, I'm talking about a 2.0 feature - even though 3.5 just came out, it is my experience that a large number of developers have still not embraced all that 2.0 gave us. Hopefully this article will help make this powerful, yet initially confusing, feature more approachable.

Some background facts

When you "foreach" over an object (array, collection, etc), it must implement IEnumerable or IEnumerable<T> - the generic version which returns strongly typed objects (for the rest of this article, I'll just refer to IEnumerable).

All arrays, and most collections (ArrayList, List<T>) implement IEnumerable, which is why you can "foreach" over them.

If ALL you need to do to a collection is cycle through its members - you only need to reference it as an IEnumerable. That explains why the following code is perfectly valid:

IEnumerable<int> numbers = new int[] { 1, 2, 3 }; foreach (int i in numbers) { Console.WriteLine(i); }

It is always better to depend on abstract types (interfaces, abstract base classes) rather than concrete types. This allows you to change the implementation of a method without impacting any of its callers.

Consider this example where CallingMethod gets a set of numbers from MethodBeingCalled. It really doesn't care if it gets an array or a List or whatever. It just does a foreach over it, which means it only needs the functionality exposed by the IEnumerable interface.

public void CallingMethod() { List<int> numbers = MethodBeingCalled(); foreach (int i in numbers) { Console.WriteLine(i); } } private List<int> MethodBeingCalled() { List<int> numbers = new List<int>(); numbers.AddRange(new int[] { 1, 2, 3 }); return numbers; }

The problem with the current design is that the CallingMethod declares the list of numbers List<int>. That is a specific implementation detail. If CallingMethod needed to add more numbers to the list after it was returned from MethodBeingCalled, or if individual numbers were being accessed by their index, then it would make sense to declare it as a List<T> (or better, IList<T>). But it really only needs to cycle through the numbers in order. If MethodBeingCalled wanted to change its implementation to return an int[] instead, it would have to change its signature, and CallingMethod would also have to change. However, if CallingMethod had just declared its local variable as IEnumerable<int> (because that is the only functionality it truly depends on), then it would not have to change if MethodBeingCalled changes - as long as it still returns something that implements IEnumerable<int>.

So what does foreach do? It asks the IEnumerable if it has any items. If so, it gets the next item and uses it within the loop. It then asks the IEnumerable again if it has more items, gets the next one, and so on, until there are no more items. The important thing to note here is that foreach never works with the entire collection all it once - it only deals with one item at a time.

Introducing yield

How can we use all of this information? If you have a method that returns a collection of data, and you can be sure that callers are only going to be interested in looping over each item in the collection, one at a time (which is very common), you can declare your method's return value as IEnumerable<T>.

When you declare your method's return value as IEnumerable<T>, something magic happens: the C# compiler will now let you use the yield keyword within the body! So why does that matter? Because it allows you to take advantage of the way that foreach really works. Remember, foreach will just ask for one value at a time. When it does, it will call into your method - your method will run until it encounters a yield return statement. When it reaches a yield return statement, that value is immediately returned to the caller. In the next iteration of the caller's loop, it will ask again if there is another value, your method will start executing again, but this time it will start on the line after the last yield statement that was run. Yes, you read that right: execution alternates between the calling method and the method being called, all the way through the loop. This is completely different from the normal way that methods interact, where one method gives up control to another method, which doesn't return until it is completely finished executing.

This has at least 2 very important implications for the method that returns the IEnumerable<T>:
1) It does not need to build up a big list of all of the items it plans to return, holding them all in memory at the same time. If the list of items is very large, holding them all in memory at once can be a significant burden.

2) It does not necessarily need to do the work to produce all of the items in the list. If the caller breaks out of the foreach loop before the entire collection has been traversed - all of the work needed to produce the remaining items in the colleciton is avoided. Without using the yield keyword, you would have to build up the entire collection to return, even if the caller was only going to look at the first few items.

Demonstration

To illustrate the advantage of this approach, I included sample code at the bottom. You can create a new Console Application and paste this over the default Program.cs.

A method, GetCombinations returns a collection of 2 numbers combined. The caller then loops through the collection, looking for a specific combination: "13".

First, run it using GetCombinations. This is implemented the way most people would write a method that returns a collection, in .NET 1.1. The output willl look something like this (the indented output is from within GetCombinations):

BEFORE call to GetCombinations
        Beginning execution of GetCombinations
        Producing 33
        Producing 23
        Producing 13
        Producing 32
        Producing 22
        Producing 12
        Producing 31
        Producing 21
        Producing 11
        Ending execution of GetCombinations
AFTER call to GetCombinations
Checking 33
Checking 23
Checking 13
Found!
Elapsed: 00:00:08.9931583

Now, comment out the line that calls GetCombinations, and uncomment the line that calls BetterGetCombinations. It is important to note that BetterGetCombinations uses the exact same algorithm as GetCombinations - given the same input, they will both return the exact same list of strings, in the same order (I've emphasized the lines that changed, in the sample code below). But since the caller is only looking for a specific combination, not all combinations need to be produced. Consider the output:

BEFORE call to GetCombinations
AFTER call to GetCombinations
        Beginning execution of BetterGetCombinations <-- notice that BetterGetCombinations doesn't really start until the foreach loop - NOT when the method was first called!

        Producing 33
Checking 33
        Producing 23
Checking 23
        Producing 13
Checking 13
Found!
Elapsed: 00:00:02.9976438

With the improved implementation that took advantage of the yield keyword, the program was able to finish its job in less than half the time! It also used much less memory, as it never had to store all 9 strings in a collection. Now imagine the potential impact if GetCombinations returned a collection with thousands of entries!

Sample Code

Paste the code below over the default Program.cs in a new Console Application:

using System; using System.Collections.Generic; using System.Diagnostics; using System.Threading; namespace YieldDemo { class Program { static void Main() { Stopwatch timer = Stopwatch.StartNew(); Console.WriteLine("BEFORE call to GetCombinations"); //1) First try running using the GetCombinations that returns a List<string> List<string> combinations = GetCombinations(3, 3); //2) After observing the output, comment out the above line, uncomment the next line, and re-run // IEnumerable<string> combinations = BetterGetCombinations(3, 3); Console.WriteLine("AFTER call to GetCombinations"); foreach (string combination in combinations) { Console.WriteLine("Checking " + combination); if (combination == "13") { Console.WriteLine("Found!"); break; } } timer.Stop(); Console.WriteLine("Elapsed: " + timer.Elapsed); } static List<string> GetCombinations(int left, int right) { Console.WriteLine("\tBeginning execution of GetCombinations"); List<string> output = new List<string>(); int currentLeft; int currentRight = right; while (currentRight > 0) { currentLeft = left; while (currentLeft > 0) { string newCombinationToProduce = currentLeft.ToString() + currentRight; Console.WriteLine("\tProducing " + newCombinationToProduce); Thread.Sleep(1000); // simulate expensive process to produce a combination output.Add(newCombinationToProduce); --currentLeft; } --currentRight; } Console.WriteLine("\tEnding execution of GetCombinations"); return output; } static IEnumerable<string> BetterGetCombinations(int left, int right) { Console.WriteLine("\tBeginning execution of BetterGetCombinations"); // NOTE The local List is not necessary in this implementation int currentLeft; int currentRight = right; while (currentRight > 0) { currentLeft = left; while (currentLeft > 0) { string newCombinationToProduce = currentLeft.ToString() + currentRight; Console.WriteLine("\tProducing " + newCombinationToProduce); Thread.Sleep(1000); // simulate expensive process to produce a combination yield return newCombinationToProduce; // insteading of adding to a list, just return the value --currentLeft; } --currentRight; } Console.WriteLine("\tEnding execution of BetterGetCombinations"); } } }

Comments

Side note (C# trivia): the foreach() keyword actually doesn't require IEnumerable, it only requires an object that has a method called 'GetEnumerator()' that returns an IEnumerator instance.
Chad Myers - February 03, 2008 02:46pm
Thanks Joshua, that's great to know. FYI: For me, the Demonstration section was unnecessary (too hard to follow) - I got what I needed to know from the code sample.
Steve Campbell - February 04, 2008 12:03pm
@Chad - I know, isn't that funky? ;) I always struggle with how many of these side thoughts I should include in a post, but I was trying really hard to keep this focused, so that it would be approachable to beginners. But I'm glad you brought it up, so the full story gets told!

@Steve - Thanks, that's exactly the kind of feedback I look for.
Joshua Flanagan - February 04, 2008 08:34pm
There is also a great demo of the keyword in the book "LINQ in Action"..showing how you can parse a big file without reading it all into memory. Pretty cool.

Since we are on fun facts...using foreach actually leaves behind an object that needs to be gc'ed. So, that is why foreach is not used in game programming (for loops are "usually" better and leave nothing to be cleaned up). Something I learned when I was dabbling with the XNA framework for the XBOX 360.
Bart Czernicki - February 05, 2008 04:05pm
So why is it better to return IList<T> than List<T>?
mike - February 06, 2008 03:55am
Interesting article. I guess it's inevitable that people miss out some language features and focus on some of the more 'sexy' aspects of 3.5.

The performance gain using yield is a compelling reason for me to explore it. I'm going to copy that source code and run it through myself now.

Thanks.
Shane - February 06, 2008 04:55am
@ Shane

It's not better to return an (IList<t>), it's better to accept IList<T> as a return parameter. That way you can return any object that supports the IList<T> interface rather than ONLY List<T>. The benefit is in code maintenance so that if you want to refactor something, you may not have to alter all of its dependencies.

Rob
Rob - February 06, 2008 06:23am
I noticed that you never printed Ending execution of BetterGetCombinations, since it never returned to BetterGetCombinations after the break statement. How would you suggest dealing with, for example, open database connections? Overall though, a very interesting article
Eric - February 06, 2008 06:36am
@Eric

If you put cleanup code in a finally block it should work (at least it made "Ending execution of BetterGetCombinations" print), though you will encounter extra overhead.
Mike - February 06, 2008 07:23am
Excellent explanation of the counter intuitive split execution of yield vs the standard linear method.
Ps Yield also works with the non-generic version of IEnumerable in case you're returning a collection that doesn't necessarily need to be locked down, because it may return multiple shapes to be consumed by a templated control like a gridview
Marc Ziss - February 06, 2008 12:25pm
Good intro. I am an occasional Dot Net developer and just want to add a gratuitous plug here for the idea that one should as a matter of principle immerse oneself in the details of more than one programming paradigm.

The yield thingy is nothing new to Python and Ruby programmers. Similarly, a familiarity with list comprehensions (Haskell, Python) would stand anyone starting out with LINQ in good stead.

Obviously anybody with an ear to the ground would also be well-advised to poke their noses into Don Syme's F# book this year.

For some good examples of yield goodness, I suggest googling for Python or Ruby combinatoric generators - e.g. k-subsets. For a C# version, see here: http://www.interact-sw.co.uk/iangblog/2004/09/16/permuterate
Peter G - February 07, 2008 02:16am
A small comment on Bart's comment mentioning object creation and garbage collection when using foreach.

The object that is, sometimes, created is nothing to do with using foreach. It is simply the enumerator returned by GetEnumerator(). Many implementations return a new enumerator everytime GetEnumerator() is called, and often the enumerator is a class/reference type. Some implementations, including parts of the XNA Framework, use struct/value types for their enumerators as this avoids creating a new object on the heap.
Leaf - February 07, 2008 04:18am
Nice article.

What advantages does this have over standard for loops? With for loops you can specify more conditions to stop the loop for this reason we tend not to use the foreach statment. How would a standard for loop compare to use of the yield keyword?
Tom - February 08, 2008 05:17am
@Tom - I'm not sure what you mean by using a for loop. Do you mean putting all of the producing and consuming logic within a single for loop? The advantage is the separation between the logic that generates a sequence of value and the logic that works with those values. In my sample code, BetterGetCombinations contains the logic to produce a series of strings. The foreach loop in the Main method contains the logic of deciding when it found what it is looking for. As soon as the Main method decides it doesn't need any more data, it stops asking for it. If the Main method doesn't ask for more data, then the data will never be produced by BetterGetCombinations. Compare that to the original GetCombinations method which will ALWAYS produce all of the combinations, regardless of whether the caller needs them all.
Of course, you could still use the yield magic and still use a for loop in Main instead of foreach, if you wanted to. It would look something like (replacing the foreach line in Main):
for (IEnumerator<string> enumerator = combinations.GetEnumerator(); enumerator.MoveNext();) {
string combination = (string) enumerator.Current;
//... continue the rest of the loop
Joshua Flanagan - February 11, 2008 03:30pm
Hey Josh,

Good post. Perhaps you should of also, went into Reflector and pasted what the magic underlying code that's generated, which from what I recall is a state machine with lots of labels and goto's involved to achieve the yield behavior.

Just my 2 cents.

Once again, good post!

Winston
Winston Pang - February 14, 2008 04:19pm
@Eric, regarding the open database connection

Best way to do this is to create a DBConnection class (or something like that) that handles your database connections, and make it implement IDisposable. In the dispose method of your DBConnection class you can put in any cleanup you need. Now you just use:

using( DBConnection conn = new DBConnection())
{
// Code that uses conn.

}

at the end of the using block, or exit of the scope (errors etc), the dispose method is called which is where you destroy the database connection. fun :)
Luke Schafer - February 17, 2008 11:50pm
If you guys haven't already then you SHOULD check out this excellent post Wes Dyer has on Iterators and what really goes on behind the scenes
http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx
Abhijeet P - February 20, 2008 08:45pm
Saw a reference to you on a post over at CodeBetter - and then found this post - nice one. Yield was on my list of things to do. :-)
Anthony Bouch - June 27, 2008 08:31am
Well explained. Thanks.
trickdev - March 06, 2010 05:55am
Great explanation Joshua, very clear and helpful. The example was also simple and straight, even the careful names made understanding easier (the demonstration section wasn´t really necessary, comments on code would have been enough). Keep the style, you're a born teacher!
Great Article Joshua.Its very helpful to understand the concepts very clearly. You've explained in a very elaborate way. Thanks.
Jaishree - May 04, 2010 01:26am
Typos and silly mistakes aside, a VERY good read. More importantly, the SIMPLICITY of the example used is wonderful. Thanks for taking out time & putting an effort to write this piece. For any such "nice" /"hidden"/"not-so-popular" but GOOD feature, such an article is what gets you started, sort of like an appetizer, it's up to the hungry developer to take it ahead from there on! Once again, thanks!
Chintan - May 25, 2010 05:43pm
Amazing explanation and sample! Congratulations
Tomamais - July 02, 2010 02:08pm