weird phenomena during deadlocks involving IMAGE o

2019-08-22 05:30发布

问题:

This is something very disturbing I stumbled upon while stress-testing an application using Sybase ASE 15.7.

We have the following table:

CREATE TABLE foo
(
    i INT NOT NULL,
    blob  IMAGE    
);
ALTER TABLE foo ADD PRIMARY KEY (i);

The table has, even before starting the test, a single row with some data in the IMAGE column. No rows are either deleted or inserted during the test. So the table always contains a single row. Column blob is only updated (in transaction T1 below) to some value (not NULL).

Then, we have the following two transactions:

T1: UPDATE foo SET blob=<some not null value> WHERE i=1
T2: SELECT * FROM foo WHERE i=1

For some reason, the above transactions may deadlock under load (approx. 10 threads doing T1 20 times in a loop and another 10 threads doing T2 20 times in loop).

This is already weird enough, but there's more to come. T1 is always chosen as the deadlock victim. So, the application logic, on the event of a deadlock (error code 1205) simply retries T1. This should work and should normally be the end of the story. However …

… it happens that sometimes T2 will retrieve a row in which the value of the blob column is NULL! This is even though the table already starts with a row and the updates simply reset the previous (non-NULL) value to some other (non-NULL) value. This is 100% reproducible in every test run.

This is observed with the READ COMMITTED serialization level.

I verified that the above behavior also occurs with the TEXT column type but not with VARCHAR.

I've also verified that obtaining an exlusive lock on table foo in transaction T1 makes the issue go away.

So I'd like to understand how can something that so fundamentally breaks transaction isolation be even possible? In fact, I think this is worse than transaction isolation as T1 never sets the value of the blob column to NULL.

The test code is written in Java using the jconn4.jar driver (class com.sybase.jdbc4.jdbc.SybDriver) so I don't rule out that this may be a JDBC driver bug.

update

This is reproducible simply using isql and spawning several shells in parallel that continuously execute T1 in a loop. So I am removing the Java and JDBC tags as this is definitely server-related.

回答1:

Your example create table code by default would create an allpages locked table unless your DBA has changed the system-wide 'lock scheme' parameter via sp_configure to another value(you can check this yourself as anyone via sp_configure 'lock scheme'.

Unless you have a very large number of rows they are all going to be sat on a single data page because an int is only 4 bytes long and the blob data is stored at the end of the table (unless you use the in-row LOB functionality in ASE15.7 and up). This is why you are getting deadlocks. You have by definition created a single hotspot where all the data is being accessed at the page level. This is even more likely where larger page sizes > 2k are used, since by their nature they will have even more rows per page and with allpages locking, even more likelihood of contention.

Change your locking scheme to datarows (unless you are planning to have very high rowcounts) as has been said above and your problem should go away. I will add that your blob column looks to allow nulls from your code, so you should also consider setting the 'dealloc_first_txtpg' attribute for your table to avoid wasted space if you have nulls in your image column.



回答2:

We've seen all kinds of weird stuff with isolation level 1. I'm under the impression that when T2 is in progress, T1 can change data and T2 might return intermediate result of T1.

Try isolation level 2 and see if it helps (does for us).