How to Select UNCOMMITTED rows only in SQL Server?

2019-04-22 12:57发布

问题:

I am working on DW project where I need to query live CRM system. The standard isolation level negatively influences performance. I am tempted to use no lock/transaction isolation level read uncommitted. I want to know how many of selected rows are identified by dirty read.

回答1:

Maybe you can do this:

SELECT * FROM T WITH (SNAPSHOT)
EXCEPT
SELECT * FROM T WITH (READCOMMITTED, READPAST)

But this is inherently racy.



回答2:

If the idea is to try to lock each row, skipping any that are in fact locked, you need to use a locking isolation level and READPAST with row-level locks and account for the fact that RCSI (as well as SI) might be on.

To count the number of uncommitted rows, I would write:

SET TRANSACTION ISOLATION LEVEL SNAPSHOT;
BEGIN TRANSACTION;
    SELECT
        (
            -- Number of rows committed at the start of the transaction
            SELECT COUNT_BIG(*) 
            FROM dbo.TheTable -- read using snapshot isolation
        )
        -
        (
            -- Minus rows that are not locked now
            SELECT COUNT_BIG(*)
            FROM dbo.TheTable WITH (READCOMMITTEDLOCK, READPAST, ROWLOCK)
        );
ROLLBACK TRANSACTION;

Note that READPAST can only skip row-level locks so a ROWLOCK hint is required. The READCOMMITTEDLOCK ensures that locking read committed is used. The SNAPSHOT hint is valid only with memory optimized tables.

It is true that SQL Server can sometimes skip exclusive locks when reading at locking read committed, but this is a safe optimization that only applies when there are no uncommitted changes to the page.



回答3:

Why do you need to know that?

You use TRANSACTION ISOLATION LEVER READ UNCOMMITTED just to indicate that SELECT statement won't wait till any update/insert/delete transactions are finished on table/page/rows - and will grab even dirty records. And you do it to increase performance. Trying to get information about which records were dirty is like punch blender to your face. It hurts and gives you nothing, but pain. Because they were dirty at some point, and now they aint. Or still dirty? Who knows...

upd

Now about data quality. Imagine you read dirty record with query like:

SELECT *
FROM dbo.MyTable
WITH (NOLOCK)

and for example got record with id = 1 and name = 'someValue'. Than you want to update name, set it to 'anotherValue` - so you do following query:

UPDATE dbo.MyTable
SET
    Name = 'anotherValue'
WHERE  id = 1

So if this record exists you'l get actual value there, if it was deleted (even on dirty read - deleted and not committed yet) - nothing terrible happened, query won't affect any rows. Is it a problem? Of course not. Becase in time between your read and update things could change zillion times. Just check @@ROWCOUNT to make sure query did what it had to, and warn user about results.

Anyway it depends on situation and importance of data. If data MUST be actual - don't use dirty reads



回答4:

The standard isolation level negatively influences performance

So why don't you address that? You know dirty reads are inconsistent reads, so you shouldn't use them. The obvious answer is to use snapshot isolation. Read Implementing Snapshot or Read Committed Snapshot Isolation in SQL Server: A Guide.

But the problem goes deeper actually. Why do you encounter blocking? Why are reads blocked by writes? A DW workload should not be let loose on the operational transactional data, this is why we have ETL and OLAP products for. Consider cubes, columnstores, powerpivot, all the goodness that allows for incredibly fast DW and analysis. Don't burden the business operational database with your analytically end-to-end scans, you'll have nothing but problems.