Select N random records in SQL Server without repe

2019-07-26 06:34发布

问题:

How do I select N random records from a table at a time without repetition of records previously returned by the same operation?

An obvious solution is:

SELECT TOP 5 * FROM
MyTable
WHERE Id NOT IN (SELECT Id FROM PreviouslyReturned)
ORDER BY newid()

But wouldn't that be really inefficient as MyTable starts to grow?

I have a long list of records and I require five records at a time for a turn-based game without repeating any of the records already pulled for the given game. Since I know approximately how many turns will take place, I could select a random sample before the game starts that is significantly large, but I would rather it be "dynamic". I found this question, which uses a random seed with MySQL.

Eventually there will be so many records that repetition won't be an issue (records >> N), but until then, I need records to be unique. On a sidenote, I use Fluent NHibernate for my persistence layer; perhaps NHibernate has some feature which allows this.

回答1:

without repeating any of the records

That's not called selecting random records among programmers. Values selected randomly repeat more often than you think, and in fact counting the number and length of repeats is one way statisticians detect cheating.

What you're looking for is called a shuffle. Shuffling randomizes the order of a finite set of things, like cards or keys. (Randomizing the order of rows doesn't mean the same thing as selecting random rows.)

In your case, plan to store the set of keys already used for each user. Select a random set of rows that aren't already in that set. There are several ways to store each random set of rows (keys); make sure you can tell which one is the last or current set.



回答2:

You could just store all primary keys of your table in a second table, select randomly from this one (retrieving the associated rows from the original table) and delete them after selection (from the auxiliary table, of course).

I would expect this way to be more efficient than storing the keys already used and constructing a WHERE NOR IN (resp. EXCEPT) clause (deleting should be approximately as performant as inserting, and selecting should work significantely faster without an additional clause). But of course this would have to be proved (by profiling ...)

Selecting five random ids should work fine using the approach in the question you've cited.



回答3:

Try

SELECT TOP 5 *
FROM YOUR_TABLE
ORDER BY CHECKSUM(NEWID())

There's a related question that goes into detail on the semantics of random shuffle selection here : Random select is not always returning a single row

SQL Server evaluates RAND once per query, which means that the mySQL trick will not work anyway.

Edit: this is also adequate

SELECT TOP 5 *
FROM YOUR_TABLE
ORDER BY NEWID()

I read your updated question and have another suggestion:
Create an indexed view on

SELECT mt.*
FROM MyTable mt
LEFT JOIN PreviouslyReturned pr ON mt.Id = pr.Id
WHERE pr.Id Is NULL

or something similar

Then

SELECT TOP 5 *
FROM YOUR_INDEXED_VIEW
ORDER BY NEWID()