可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
The problem:
I have a table that records data rows in foo
. Each time the row is updated, a new row is inserted along with a revision number. The table looks like:
id rev field
1 1 test1
2 1 fsdfs
3 1 jfds
1 2 test2
Note that in the table the last record is a newer version of the first row.
Does anyone know of an efficient way to query for the latest version of the rows, ans a specific version of records? For instance, a query for rev=2
would return the 2, 3 and 4th row (not the replaced 1st row though) while a query for rev=1
yields those rows with rev <= 1 and in case of duplicated ids, the one with the higher revision number is chosen (record: 1, 2, 3).
I'm not actually sure if this is even possible in SQL Server...
I would not prefer to return the result in an iterative way.
回答1:
To get only latest revisions:
SELECT * from t t1
WHERE t1.rev =
(SELECT max(rev) FROM t t2 WHERE t2.id = t1.id)
To get a specific revision, in this case 1 (and if an item doesn't have the revision yet the next smallest revision):
SELECT * from foo t1
WHERE t1.rev =
(SELECT max(rev)
FROM foo t2
WHERE t2.id = t1.id
AND t2.rev <= 1)
It might not be the most efficient way to do this, but right now I cannot figure a better way to do this.
回答2:
This is how I would do it. ROW_NUMBER()
requires SQL Server 2005 or later
Sample data:
DECLARE @foo TABLE (
id int,
rev int,
field nvarchar(10)
)
INSERT @foo VALUES
( 1, 1, 'test1' ),
( 2, 1, 'fdsfs' ),
( 3, 1, 'jfds' ),
( 1, 2, 'test2' )
The query:
DECLARE @desiredRev int
SET @desiredRev = 2
SELECT * FROM (
SELECT
id,
rev,
field,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY rev DESC) rn
FROM @foo WHERE rev <= @desiredRev
) numbered
WHERE rn = 1
The inner SELECT
returns all relevant records, and within each id
group (that's the PARTITION BY
), computes the row number when ordered by descending rev
.
The outer SELECT
just selects the first member (so, the one with highest rev
) from each id
group.
Output when @desiredRev = 2
:
id rev field rn
----------- ----------- ---------- --------------------
1 2 test2 1
2 1 fdsfs 1
3 1 jfds 1
Output when @desiredRev = 1
:
id rev field rn
----------- ----------- ---------- --------------------
1 1 test1 1
2 1 fdsfs 1
3 1 jfds 1
回答3:
Here's an alternative solution that incurs an update cost but is much more efficient for reading the latest data rows as it avoids computing MAX(rev)
. It also works when you're doing bulk updates of subsets of the table. I needed this pattern to ensure I could efficiently switch to a new data set that was updated via a long running batch update without any windows of time where we had partially updated data visible.
Aging
- Replace the
rev
column with an age
column
- Create a view of the current latest data with filter:
age = 0
- To create a new version of your data ...
- INSERT: new rows with
age = -1
- This was my slow long running batch process.
- UPDATE:
UPDATE table-name SET age = age + 1
for all rows in the subset. This switches the view to the new latest data (age = 0) and also ages older data in a single transaction.
- DELETE: rows having
age > N
in the subset - Optionally purge old data
Indexing
- Create a composite index with
age
and then id
so the view will be nice and fast and can also be used to look up by id. Although this key is effectively unique, its temporarily non-unique when you're ageing the rows (during UPDATE SET age=age+1
) so you'll need to make it non-unique and ideally the clustered index. If you need to find all versions of a given id
ordered by age
, you may need an additional non-unique index on id
then age
.
Rollback
Finally ... Lets say you're having a bad day and the batch processing breaks. You can quickly revert to a previous data set version by running:
UPDATE table-name SET age = age - 1
-- Roll back a version
DELETE table-name WHERE age < 0
-- Clean up bad stuff
Note: I recommend naming the aging column RowAge
instead of age
to indicate this pattern is being used since it's clearer that its a database related value and it complements SQL Server's RowVersion
naming convention. It also won't conflict with a column or view that needs to return a person's age.
Unlike other solutions, this pattern works for non SQL Server databases.
回答4:
If you want all the latest revisions of each field, you can use
SELECT C.rev, C.fields FROM (
SELECT MAX(A.rev) AS rev, A.id
FROM yourtable A
GROUP BY A.id)
AS B
INNER JOIN yourtable C
ON B.id = C.id AND B.rev = C.rev
In the case of your example, that would return
rev field
1 fsdfs
1 jfds
2 test2
回答5:
SELECT
MaxRevs.id,
revision.field
FROM
(SELECT
id,
MAX(rev) AS MaxRev
FROM revision
GROUP BY id
) MaxRevs
INNER JOIN revision
ON MaxRevs.id = revision.id AND MaxRevs.MaxRev = revision.rev
回答6:
SELECT foo.* from foo
left join foo as later
on foo.id=later.id and later.rev>foo.rev
where later.id is null;
回答7:
How about this?
select id, max(rev), field from foo group by id
For querying specific revision e.g. revision 1,
select id, max(rev), field from foo where rev <= 1 group by id