We had a scenario in our production environment where users were unable to transact on a particular table. As a first step, we confirmed that this HANG is specific only to the table and not for the entire Database. Subsequently we started with our below approach
a. Checked and confirmed that this table had the recent statistics which was generated on the day begining.
b. Opened a new session to run the query
"select count(*) from table_name
We expected this to do a FTS (Full Table Scan).
c. Wait Events
We checked for any wait events in the database for that session as well as for the whole database, using the below command.
SELECT EVENT,COUNT(*) FROM V$SESSION_WAIT WHERE WAIT_TIME=0 GROUP BY EVENT;
We couldn't spot out any particular event affecting this hang as all where regular wait events. This particular session was waiting for DB FILE SEQUENTIAL READ wait event which we felt was an obvious though we expected to wait for DB FILE SCATTERED READ. as it should do a FTS.
d. Optimizer issue : DELETE STATISTICS
Cancelled the previous session, deleted the table statistics using DBMS_STATS.DELETE_TABLE_STATS and then ran the same query again. No change the same behaviour and same wait event.
e. 10046 event trace
Hence we decided to cancel the above session inorder to open a new one and to start doing 10046 trace to find out where it goes.
ALTER SESSION SET EVENTS '10046 TRACE NAME CONTEXT FOREVER, LEVEL 12';
followed by the select count(*).
Trace file started displaying the wait event "DB FILE SEQUENTIAL READ" on the undo tablespace file. Here we got to know that server process is scanning undo blocks which would probably mean that it is doing a CONSISTENT READ. But neither a transactions is happening on that table nor a rollback at the background. After spotting the last few data blocks, which were before the server process decides to scan undo blocks, dumped those and found nothing suspecious.
ALTER SYSTEM DUMP DATAFILE file_no
f. SYSTEMSTATE dump
Finally we decided to take a systemstate dump on the HANGING process.
ALTER SESSION SET EVENTS 'IMMEDIATE TRACE NAME SYSTEMSTATE LEVEL 266';
From the trace file we got a hint. The trace file got a blockdump which belongs to the table on which the problem was reported. The spurious fact which we identified from the blockdump was an active transaction in the ITL list whilst there is no active TXN on that table. We could easily guess that PMON could have failed to clean somewhere the failed txns. Hence we the solution was to clear the ITL.
g. Solution
We just ran a simple update followed by a rollback which resolved the problem.
UPDATE table_name set date_field
ROLLBACK;
Regards Badri
No comments:
Post a Comment