This post is not sufficient to learn LINQ. For that I suggest the excellent LINQ in Action. But when you are done, 90% of LINQ can be distilled down to the following items, starting with defining a lambda expression. And each of these concepts is fundamentally very simple and straightforward.
Definition of a lambda expression
The statement:
(film1, film2) => film1.Name.CompareTo(film2.Name)
Is identical to:
delegate int FilmNameOrder (Film film1, Film film2);
static int FilmSort (Film film1, Film film2) {
return film1.Name.CompareTo(film2.Name);
}
That's it. The two sets of code above are identical. Not similar. Not identical results. They're identical code. So how does this work?
It's actually pretty simple. Let's say you have a Sort method that needs to be passed a delegate of the form "delegate int Compare (object obj1, object obj2)." But the compiler is smarter than that, if you have a List<Film> then the compiler knows the Sort method must get a delegate in the form "delegate int Compare (Film obj1, Film obj2)."
The compiler allows you to pass an inline anonymous delegate function where pass the following:
Sort(delegate (Film film1, Film film2) { return film1.Name.CompareTo(film2.Name); });
No need to declare the return type – it has to be an int. But there really is no need to declare that film1 and film2 are of type Film. Again the compiler knows this because the Sort() method must be passed two Film objects. Now the following code is not legal, but it does provide all necessary information to the compiler:
Sort(delegate (film1, film2) { return film1.Name.CompareTo(film2.Name); });
And that is your lambda expression. Drop the delegate and return (also not needed), and you have your lambda expression. There is nothing magical about it, a lambda merely strips away all the information the compiler already knows, leaving just the internal logic of the method. And in doing so it makes the logic of what you're writing a lot clearer because the delegate is now inline and is reduced to its actual logic.
Extension Methods
Lambdas are really nice, but by themselves their utility would be limited. There are two other key parts that make LINQ super powerful. The next is extension methods. These very simply are a way to add methods to a class you don't control. Say you think the string class should have a method string.CapFirstLetterOfEachWord(). Well you can add it creating the method:
Static string CapFirstLetterOfEachWord (this string source) {
// Use StringBuilder to build it up with word caps
Return sb.ToString();
}
And you've now added to the member methods for string. There's nothing special about the code in the method except that you use the reserved word this in the declaration. And the compiler uses that as shorthand to accept "yourString = myString.CapFirstLetterOfEachWord()" where before you would have to call "yourString = StringUtils.CapFirstLetterOfEachWord(myString)". There again is no difference in the code generated between the two, but it lets you write cleaner code.
This is used throughout LINQ to add methods to all the container classes. Now there are a couple of limitations to extension methods. The biggies are that you cannot override existing methods and the extension methods are seen only within your code.
You also need to include "using System.Linq" to use the LINQ extension methods. Without that using statement, the compiler has no way to know that these extension methods exist and will then not find them. (This is true of any extension methods - you must include the appropiate using for the compiler to know they exist.) This will throw you at first because you will be able to declare and use List<MyObject> but you won't see any of the LINQ methods. With extension methods, including the using for the class itself is not sufficient.
LINQ Methods
And finally we come to the LINQ library itself. Let's look at the Where method as a common example:
Public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, Boolean> predicate);
The LINQ methods are all extension methods and the majority of the pass in an IEnumerable<TSource> and return an IEnumerable<TSource>. This gives you the ability to chain the methods one after the other to perform multiple operations on a collection of data. And just about any collection of data implements IEnumerable. The key to properly using LINQ methods is understanding that you are passing a typed IEnumeration<TSource> in and getting a revised IEnumeration<TSource> out.
The second key point is that Func<TSource, Boolean> will generally be a lambda expression. It does not have to be – you can pass in a declared delegate as that's the same thing. But you will (almost) always want to use a lambda expression. Which then gives you a very clean inline means of performing complex transformations on your data.
Conclusion
One final item of note, the standard LINQ methods do not provide a way to do anything to a collection. You can sort it, select a subset of it, convert the type of collection, etc. – but you can't set a property on items in the collection. You can write your own extension methods that call methods and/or set properties on each object in the collection. But the standard library does not include a Do() member.
I don't think the above is sufficient to teach LINQ, not even close. But I hope it helps you understand what is going on a little quicker and easier. And fundamentally LINQ is built on some very simple concepts. Once you understand what they are, and why they work, LINQ becomes very easy to use. So easy in fact that after a month or so, you will find it very difficult to go back to the old way of writing code (which unfortunately I have to do for a library we have to build in VS 2005 – because some of it is in J#).
Any comments about what you think are the keys to understanding LINQ are greatly appreciated.
Really impressive article about LINQ! It's great to hear that you are interested in LINQ, I believe we can discuss about it more in the future. Please feel free to ping me at misun AT microsoft DOT com. I love LINQ too, HAHA!
Posted by: Michael Sun | 04/19/2011 at 11:47 PM
Thanks, Dave. Some questions.
Are lambda expressions equivalent to anonymous delagates?
Is it just as easy to debug linq code?
How should you provide a lambda expression to Func?
Do you think LINQ is safer to use when possible, as in less error-prone, provided the developer is proficient?
Posted by: meowkins | 05/02/2011 at 01:51 PM
Tomas;
1. Yes.
2. Easier.
3. The same as you would pass an anonymous delegate.
4. Yes - that's the big win. The generated code is the same but because it is reduce to the essential logic and is inline, it makes it much easier to see what you are doing in a single glance.
Posted by: David Thielen | 05/02/2011 at 09:42 PM