Finding cause of deadlock error from oracle trace

2019-01-22 04:14发布

问题:

I have been getting this "ora-00060 deadlock detected while waiting for resource" error often now in my application when multiple users are using the application. I have got the trace file from the oracle Admin, but need help in reading it. Below is bits of data from the trace file, which i hope would help in locating the cause.

*** 2013-06-25 09:37:35.324
DEADLOCK DETECTED ( ORA-00060 )

[Transaction Deadlock]

The following deadlock is not an ORACLE error. It is a deadlock due 
to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:

Deadlock graph:
                   ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TM-000151a2-00000000       210      72    SX   SSX      208      24    SX   SSX
TM-000151a2-00000000       208      24    SX   SSX      210      72    SX   SSX

session 72: DID 0001-00D2-000000C6  session 24: DID 0001-00D0-00000043 
session 24: DID 0001-00D0-00000043  session 72: DID 0001-00D2-000000C6 

Rows waited on:
 Session 72: no row
 Session 24: no row

----- Information for the OTHER waiting sessions -----
Session 24:
 sid: 24 ser: 45245 audsid: 31660323 user: 90/USER
  flags: (0x45) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
  flags2: (0x40009) -/-/INC
 pid: 208 O/S info: user: zgrid, term: UNKNOWN, ospid: 2439
   image: oracle@xyz.local
 client details:
   O/S info: user: , term: , ospid: 1234
   machine: xyz.local program: 
 current SQL:
  delete from EMPLOYEE where EMP_ID=:1

 ----- End of information for the OTHER waiting sessions -----

Information for THIS session:

 ----- Current SQL Statement for this session (sql_id=dyfg1wd8xa9qt) -----
 delete from EMPLOYEE where EMP_ID=:1
===================================================

I would appreciate if some one can tell me what the "Deadlock graph::" is saying. Also the rows waited on section says no rows.

I also read in some blogs that "sqltxt" section from the trace file can suggest the cause. Below is the query i see in that section.

 select /*+ all_rows */ count(1) from "USERS"."EMPLOYEE_SALARY" where EMPSAL_EMP_ID=:1

The employee_salary table has foreignkey constraint on EMPSAL_EMP_ID column.

The sql hint says "all_rows", so does it mean that this table gets table level lock when deleting records from employee table? i dont have an index on the foreign key column currently. Would adding an index on this column help?

Kindly post, in case any more information is need.

Thanks

回答1:

First of all, select statement never lock anything in Oracle, just uses last available consistent version of data. It's not a case for select ... for update which locks data like update since Oracle 9i, but there are no for update clause in the query from question.

Resource Name          process session holds waits  process session holds waits
TM-000151a2-00000000       210      72    SX   SSX      208      24    SX   SSX

Session #72 holds table-level lock (TM) with "Row Exclusive" type (SX) and want to acquire "Share Row Exclusive" (SSX) lock on same table. This session blocked by Session #24 which already holds table-level lock of a same type (SX) and waits while SSX lock would be available.

Resource Name          process session holds waits  process session holds waits
TM-000151a2-00000000       208      24    SX   SSX      210      72    SX   SSX

This (second row) demonstrates exactly same situation, but in opposite direction: Session #24 waits for SSX lock become available, but blocked by Session #72 which already holds SX lock on same table.

So, Sessions #24 and Session #72 blocks each other: deadlock happens.

Both lock types (SX and SSX) are table-level locks.
To understand the situation I recommend to read this article by Franck Pachot.

Below is citation from this article, which directly relevant to your situation(note that SSX and SRX abbreviations are equivalent):

Referential integrity also acquires TM locks. For example, the common issue with unindexed foreign keys leads to S locks on child table when you issue a delete, or update on the key, on the parent table. This is because without an index, Oracle has no single lower level resource to lock in order to prevent a concurrent insert that can violate the referential integrity.
When the foreign key columns are the leading columns in a regular index, then the first index entry with the parent value can be used as a single resource and locked with a row level TX lock.
And what if referential integrity has an on delete cascade? In addition to the S mode, there is the intention to update rows in the child table, as with Row X (RX) mode. This is where the share row exclusive (SRX) occurs: S+RX=SRX.

So, most probable variant is that Session #72 and Session #24 deletes some rows in EMPLOYEE table at same time, and there are on delete cascade constraint for EMPSAL_EMP_ID in conjunction with absence of index on EMPLOYEE_SALARY table in which EMPSAL_EMP_ID column listed first.