I want to select a handful of random rows from the results of a complex query on a very large table (many millions of rows).
I am using SQL Server 2008, and the proper way to do this efficiently seems to be the TABLESAMPLE clause.
Note 1: I am not interested in the popular "order by NEWID()" solution - it is inefficient for large tables.
Note 2: Since my query is complex, I do not want to have to first calculate the COUNT over it, if possible.
Note 3: Since the resultset is huge, I do not want to have to traverse it myself, such as is suggested here.
The kicker is that I am using LINQ. Specifically, LINQ-To-Entities.
Is there a LINQ-friendly way to use TABLESAMPLE?
Even if there is no direct support, is there some way I can write most of my query in LINQ and then do a small amount of manual SQL to perform the TABLESAMPLE?
Not a direct answer to your question but you can use this technique to select a random percentage sample of individual rows. The following query uses the NEWID function to return approximately one percent of the rows of the Sales.SalesOrderDetail table:
SELECT * FROM Sales.SalesOrderDetail
WHERE 0.01 >= CAST(CHECKSUM(NEWID(), SalesOrderID) & 0x7fffffff AS float) / CAST (0x7fffffff AS int)
Possibly of interest: T-SQL: Generating Random Numbers, Random Sampling and Random ‘Goodness’
Something like this should work (syntax may not be exactly right but you should get the idea):
var rowCount = context.MyTable.Count();
int randomInt = new Random().Next(rowCount);
var query = context.MyTable.Skip(randomInt).FirstOrDefault();
With EF, you could create a defining query in your model using the tablesample construct. See http://msdn.microsoft.com/en-us/library/cc982038.aspx. Alternatively, you could create a randomized view in your database and then include that view in your model.
It seems that what I want to accomplish is not even possible in the first place.
TABLESAMPLE cannot be used on derived tables, so it is not even feasible to have a complex query generating a large result set and then get a random sampling with TABLESAMPLE.
TABLESAMPLE is only something that can be used on the base tables that go into a query, before joins and soforth. (see documentation)
This MSDN link describes a way to get a random percentage of results efficiently, so the best way to do approximately what I want may be to use that in a view, and build my LINQ off of that view.
Thank you all for the input.