I am trying to get 5 random number of rows from a large table (over 1 million rows) with a fast method.
So far what I have tested with these SQL queries:
Method 1
Select top 5 customer_id, customer_name
from Customer TABLESAMPLE(1000 rows)
order by newid()
This method estimated I/O cost is 0.0127546
so this is very fast (index scan nonclustered)
Method 2
select top 5 customer_id, customer_name
from Customer
order by newid()
This method's sort estimated I/O cost is 117.21189
and index scan nonclustered estimated I/O cost is 2.8735
, so this is affecting performance
Method 3
select top 5 customer_id, customer_name
from Customer
order by rand(checksum(*))
This method's sort estimated I/O cost is 117.212
and index scan nonclustered estimated I/O cost is 213.149
, this query is slower than all because estimated subtree cost is 213.228
so it's very slow.
UPDATE:
Method 4
select top 5 customer_id, customer_name, product_id
from Customer
Join Product on product_id = product_id
where (customer_active = 'TRUE')
order by checksum(newid())
This approach is better and very fast. All the benchmark testing is fine.
QUESTION
How can I convert Method 4 to LINQ-to-SQL? Thanks
If you want to convert Method 2 into Linq To Entities just use the solution answered by
jitender
which look like this:But for Method 1 which is very fast following your benchmarking, you need to do the following C# code because Linq To Entities doesn't have a LINQ equivalent for this SQL statement
TABLESAMPLE(1000 rows)
.You can move the SQL statements into a SQL View or Stored Procedure which will receive the number of customers to take.
UPDATE
For Method 4 which seems to be very fast (always by following your benchmark), you can do the following Linq To Entities:
Entity Framework can translate into SQL all functions that are defined into SqlFunctions class. In those functions we have
Checksum
function which will do what you want.If you want to join with other tables you can do it without difficulty with Linq To Entites so I just simplified my version by querying only the
Customers
DbSets
.As stated Here's the best way: