How do you find the smallest unused number in a SQL Server column?
I am about to import a large number of manually recorded records from Excel into a SQL Server table. They all have a numeric ID (called document number), but they weren't assigned sequentially for reasons that no longer apply, meaning from now on when my web site records a new record, it needs to assign it the smallest possible document number (greater than zero) that has not already been taken.
Is there a way to do this through plain SQL or is this a problem for TSQL/code?
Thanks!
EDIT
Special thanks to WW for raising the issue of concurrency. Given that this is a web app, it is multi-threaded by definition and anyone faced with this same problem should consider either a code or DB level lock to prevent a conflict.
LINQ
FYI - this can be accomplished via LINQ with the following code:
var nums = new [] { 1,2,3,4,6,7,9,10};
int nextNewNum = (
from n in nums
where !nums.Select(nu => nu).Contains(n + 1)
orderby n
select n + 1
).First();
nextNewNum == 5
If you sort them by numeric ID, the number you are looking for will be the first one for which the ROW_NUMBER() function doesn't equal the ID.
If there are gaps in the sequence, you can find the first gap with something like this:
In other words, find the least ID whose successor does not exist, and return that successor. If there are no gaps, it returns one greater than the greatest extant ID. A placeholder ID of 0 is inserted to insure that IDs starting with 1 are considered.
Note that this will take at least n log n time.
Microsoft SQL permits the use of a
from
clause in aninsert
statement, so you may not need to resort to procedural code.Does 1 table scan rather than 2 scans a hash match and a join like the top answer
Let's assume your IDs should always start with 1:
This handles all cases I can think of - including no existing records at all.
The only thing I don't like about this solution is that additional conditions have to be included twice, like that:
Please also notice the comments about locking and concurrency - the requirement to fill gaps is in most cases bad design and can cause problems. However, I had a good reason to do it: the IDs are to be printed and typed by humans and we don't want to have IDs with many digits after some time, while all the low ones are free...
This is an alternative to the answers using correlated subqueries given by @Jeffrey Hantlin and @Darrel Miller.
However, the policy you're describing is really not a good idea. ID values should be unique, but should not be required to be consecutive.
What happens if you email someone with a link to document #42, and then subsequently delete the document? Later, you re-use the id #42 for a new document. Now the recipient of the email will follow the link to the wrong document!
Is there a reason that it has to be the smallest possible number? Why do you need to fill the holes?
Edit to ad the answer, since it's a business rule.
(I don't have a db handy, so this may not be 100% accurate, but you should be able to get it from there)