A long time ago I was advised to sort data at the application layer and not use the ORDER BY
clause in SQL. The reasoning was the .Net will sort more efficiently the SQL engine.
Conflicting with this advice is the SSIS Best Practices I've encountered that recommends sorting data in the SQL, where one can, and avoiding the Sort transformation.
The SSIS advice makes sense to me. So now I am wondering if the initial advice of avoiding the ORDER BY
is bogus.
Given a not-too-complex query, does the ORDER BY necessarily mean a performance hit?
Thanks.
Brent Ozar's argument for avoiding ORDER BY boils down to SQL Server licenses being expensive and application server licenses being cheap. The "application server" in the case of SSIS is, in fact, SQL Server, so the "cheaper server" argument doesn't apply.
I've never seen the argument that .NET sorting was inherently faster than SQL Server sorting, but I'd be extremely surprised if that was generally true (especially given the amount of meta-information about the underlying data that is available to the SQL Server query optimizer but is not available to a generic .NET Sort() method). I do know that the SSIS Sort transformation can put a big performance hit on the data flow, since all the data has to be cached by SSIS before the sorting can begin.
So, in the specific case of choosing between using a T-SQL ORDER BY clause to sort data or an SSIS Sort transformation, I'd always choose the ORDER BY clause to start with.
First, if you really want to know on a given set of data, then you should test it.
That said, there are several reasons why I think you should sort on the server side.
First, the server can take advantage of more hardware -- multiple threads, multiple disks, multiple processors -- for sorting. This can make a big difference on performance.
Second, the sorting may not be necessary. There may be cases where the query does not actually have to sort the results, because they are already sorted. For instance, the results may be returned based on an index that is sorted.
Third, memory usage issues and memory leaks tend to be more prevalent on the client side. (Okay, you don't say you are using java, so you are a bit safe from this.) The database server knows how to manage memory.
Fourth, I think it is a good idea to do the data manipulation on the server side. Coding gets quite complicated if you try to micro-optimize each operation, with some being on the server and some being on the client side. Unless something is related specifically to the presentation of the data, do it on the server.
All that said, if you are just sorting 20 items for presentation purposes on, say, a single page, then it doesn't make a difference. Do it on the client side if you are comfortable with that.