Linq is an awesome addition to .NET and I've found it has served me well in many situations even though I'm only beginning to learn about how to use Linq.
However, in the reading I've been doing about Linq, I've discovered that there are some subtle things a developer needs to keep an eye out for that can lead to trouble.
I've included one definite caveat that I've come across that is a result of deferred execution.
So I'm wondering, what other caveats exist for Linq that developers new to Linq should know about?
Good question. As Reed points out they all mostly stem from deferred execution (but unlike he I find it a drawback. Just thinking why cant deferred executions be carried out by memorizing the state). Here are a couple of examples - all are more or less variants of deferred execution problem.
1) I'm too lazy to do something on time
A common mistake newbies (myself in the past included) make is not knowing about deferred execution. For eg, something like
runs in a jiffy, but the actual sorting is not completed until you enumerate the list, in other words, the execution is not completed until you need the result of the execution. To get it executed, you need something like:
See them as SQL queries. If you have
think its akin to writing
Does the latter run a call to db? No. You have to
Its the same thing here as well with Linq.
2) I live in the present
To be safe, backup variables first and then use the backup in query if the variable can change later on before the actual execution of query.
From here:
Suppose we have four customers with the following balances: 100, 300, 400 and 600. What will count1 and count2 be? They'll both be 3. The "customersOver500" references the "minimumBalance" variable, but the value isn't obtained until the query results are iterated over (through a for/each loop, a ToList() call or even a "Count()" call as shown above). At the time the value is used to process the query, the value for minimumBalance has already changed to 200, so both LINQ queries produce identical results (customers with a balance over 200).
3) My memory is too weak to remember the valuables of the past
or this from the same site:
Consider this simple example of a method using LINQ-to-SQL to get a list of customers:
Seems pretty harmless -- until you get an "ObjectDisposedException" when you try and enumerate the collection. Why? Because LINQ doesn't actually perform the query until you try and enumerate the results. The DBContext class (which exposes the Customers collection) is disposed of when this call exits. Once you try and enumerate through the collection, the DBContext.Customers class is referenced and you get the exception.
4) Don't try to catch me, I might still slip away
Instead global exception handling will be better here.
Neither we get the correct error message nor the function is quit by
return
.5) I'm not only unpunctual, but I don't learn from mistakes as well
Suppose you have an
IQueryable
orIEnumerable
returned from a Linq expression. Now enumerating the collection will get the statement executed, but only once? No, every time you do. This had bitten me in the past. If you have:So better do
6) If you don't know to use me, I can cause side effects!
Ensure you don't do side-effect programming (since re-enumerating in Linq is much more common) To give a wild example,
If you enumerate twice you know what undesired thing can happen.
7) Be wary of order I am executed when chaining
Eg,
What will be the count of distinct people in
f
? I would guess 100, no but it is 200. The problem is that when the actual execution of the logic of concatenation takes place,f
is stilld.Select(t => new Person()
unexecuted. So this effectively yields inwhich then has 200 distinct members. Here's a link for the actual problem
8) Hey, actually we're smarter than you think.
The reason that deferred execution is basically executed on demand makes Linq much more efficient than it appears. The iterator block "yields" one item at a time, as demanded, lending the ability to stop execution when its no more needed. Here is a very good question that details just that: Order of LINQ extension methods does not affect performance?
9) I'm not meant to crunch number
For number crunching algorithms, Linq is not the right tool, especially for large data sets whose complexity can scale exponentially. Sometimes just two for loops would suffice better. The same can apply for raw SQL when compared to LINQ to SQL.
10) Hire me for the right job
Some eg:
for a foreach on an enumerable.
or
Just bad tools.
11) Debugging and Profiling can be a nightmare
Not that its entirely impossible, but its bit of a task to debug a linq query as efficiently as non linq code from VS itself. Profiling also becomes a tad harder because of the nature of deferred execution. But it shouldn't stop anyone from doing the trivial one or two liners!
A bunch of caveats all related to deferred execution more or less! A ditto question here. Some related reading on SO:
Examples on when not to use LINQ
Pros and Cons of LINQ (Language-Integrated Query)
What is the biggest mistake people make when starting to use LINQ?
drawbacks of linq
I think LINQ is fairly solid, and there aren't a lot of big caveats. Nearly every "problem" I've run into is the result of deferred execution, and it's not really a problem, but rather a different way of thinking.
The biggest issue I've faced - LINQ is a game changer (or at least a rule bender) when it comes to profiling for performance. The deferred execution can make it much more difficult to profile an application at times, and can also dramatically change the runtime performance characteristics in unexpected ways. Certain LINQ operations seem almost magical with how fast they are, and others take a lot longer than I expected - but it's not always obvious from the code or profiler results.
That being said, in general, the deferred execution more than makes up for the cases where it's slowed down hand-coded routines. I much prefer the simpler, cleaner code to the code it replaced.
Also, I have found that the more I use LINQ to Objects, the more I have to rethink my design and rework my collections in general.
For example, I had never realized how often I was exposing IList instead of IEnumerable when it wasn't absolutely necessary until I started using linq to objects frequently. I now completely understand why MS design guidelines warn against using IList too often (for example, don't return IList just for the Count property, etc). When I'd have methods that took IList, passing through the IEnumerable results from a linq query requires .ToList() or a reworking of the method's API.
But it's almost always worth the rethinking - I've found many places where passing an enumerable and using LINQ resulted in a huge perf. gains. The deferred execution is wonderful if you think about it, and take full advantage of it. For example, using .Take() to restrict a collection to the first 2 elements if that's all that's needed was a bit more challenging pre-linq, and has dramatically sped up some of my nastier loops.
Building up a query within a foreach loop
The above code only removes the "u" from the string because of deferred execution.
In order to remove all the vowels you need to do the following: