Unkillable Oracle session waiting on “SQL*Net mess

2019-06-21 21:02发布

问题:

On Oracle 11gR2, I've recently encountered a very interesting situation involving a blocked (but idle!) MERGE statement that hangs on a "SQL*Net message from client" event, causing subsequent, concurrently executed MERGE statements to block on the first statement via "cursor: pin S wait on X" events. In Oracle Enterprise Manager, the following can be observed:

This situation turns even more severe, as the above Session-ID 1204 cannot be killed with either:

alter system kill session 'sid,serial#';
alter system kill session 'sid,serial#' immediate;

Our DBA is sometimes able to kill the operating system process, but often, the whole database needs to be restarted. Luckily, thus far, only on a test system, never in production.

Note:

I'm aware this is probably a similar issue as reported in this rather vague question: Oracle updates/inserts stuck, DB CPU at 100%, concurrency high, SQL*Net wait message from client. I'll still report it again, as I have a clear reproduction path, which I'll report as an answer.

回答1:

This seems to be a bug in Oracle when CLOB data types are used as values that are passed to the MERGE statement's ON clause. Assume this database:

CREATE TABLE t (
  v INT, 
  s VARCHAR2(400 CHAR)
);

Reproduction using inlined values

Now, run the following statement in any Oracle client, including SQL*Plus, SQL Developer or from JDBC, which helps reproducing the issue very easily (I'm using Oracle 11g XE 11.2.0.2.0):

MERGE INTO t                      
USING (
  SELECT 
    1 v, 
    CAST('abc' AS CLOB) s 
  FROM DUAL
) s 
ON (t.s = s.s) -- Using a CLOB here causes the bug.
WHEN MATCHED THEN UPDATE SET
  t.v = s.v        
WHEN NOT MATCHED THEN INSERT (v, s) 
VALUES (s.v, s.s);

The example is silly, and the CLOB was bound here by "accident". Nonetheless, such a statement should not create a zombie session in Oracle, but it's there. I'm running the above statement three times in SQL*Plus and then running this...

SELECT 
  s.sid,
  s.serial#,
  s.sql_id,
  s.event,
  s.blocking_session,
  q.sql_text
FROM v$session s
JOIN v$sql q
ON s.sql_id = q.sql_id
WHERE s.username = 'TEST'
AND UPPER(TRIM(q.sql_text)) LIKE 'MERGE%';

... I get:

sid serial# sql_id          event                       blocking_session
9   3       82a2k4sqzy1jq   cursor: pin S wait on X     92
49  89      82a2k4sqzy1jq   cursor: pin S wait on X     92
92  13      82a2k4sqzy1jq   db file sequential read     

Notice how the reported event is different ("db file sequential read") from the original event ("SQL*Net message from client"), which was using bind variables

Reproduction using bind values

var v_s varchar2(50)
exec :v_s := 'abc'

MERGE INTO t                      
USING (
  SELECT 
    1 v, 
    CAST(:v_s AS CLOB) s 
  FROM DUAL
) s 
ON (t.s = s.s) -- Using a CLOB here causes the bug.
WHEN MATCHED THEN UPDATE SET
  t.v = s.v        
WHEN NOT MATCHED THEN INSERT (v, s) 
VALUES (s.v, s.s);

The above statement run in SQL*Plus also produces the bug:

sid serial# sql_id          event                           blocking_session
8   1       4w9zuxrumumgj   SQL*Net message from client     
90  7       4w9zuxrumumgj   cursor: pin S wait on X         8
94  21      4w9zuxrumumgj   cursor: pin S wait on X         8

No reproduction in PL/SQL

Interestingly, the bug is avoided in the following PL/SQL statement:

DECLARE
  v_s CLOB := 'abc';
BEGIN
  MERGE INTO t                      
  USING (
    SELECT 
      1 v, 
      CAST(v_s AS CLOB) s 
    FROM DUAL
  ) s 
  ON (t.s = s.s) -- Using a CLOB here causes the bug.
  WHEN MATCHED THEN UPDATE SET
    t.v = s.v        
  WHEN NOT MATCHED THEN INSERT (v, s) 
  VALUES (s.v, s.s);
END;
/

I'm getting:

          CAST(v_s AS CLOB) s
          *
ERROR at line 8:
ORA-06550: line 8, column 11:
PL/SQL: ORA-00932: inconsistent datatypes: expected - got CLOB
ORA-06550: line 4, column 7:
PL/SQL: SQL Statement ignored

It looks as though the PL/SQL engine saveguards clients from this SQL engine bug.