We will discuss the following to make clear about the issue and troubleshoot mechanism:
1) What is Transaction Recovery in Oracle?
2) Possible issues
3) Related parameter and modes of Transaction Recovery
4) Detection of Transaction Recovery
5) Need of Oracle support
6) Disable Transaction Recovery
7) Precautions:
8) Oracle 11.2.0.4 BUG and workaround
Details:
1) What is Transaction Recovery in Oracle?
When rolling back, all uncommitted transactions of a failed instance, Transaction Recovery is started by SMON. Some "in-progress" transactions that may not committed and Oracle needs to undo the same.
Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the crash or introduced by redo application during cache recovery. This normally happens during the Roll backward phase when the DB is restarted. Transaction Recovery can be performed by either the Server process which initiated the Transaction or the SMON process (in case the Server process is dead).
SMON process takes over the recovery when:
- Server process is dead / crashed.
- Instance itself is crashed
After the roll forward, any changes that were not committed must be undone. Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the failure or introduced by redo application during cache recovery. This process is called rolling back or transaction recovery.
2) Possible issues
a) High CPU Utilization by SMON process
b) Database may hang during large transaction recovery.
c) If Database is shutdown abort, then the database may hang during consequent startup.
d) Database repeatedly crashes while open.
e) Cause huge volume archive-log generation
3) Related parameter and modes of Transaction Recovery
Parameter : fast_start_parallel_rollback
Default is:
SQL> show parameter fast_start_parallel_rollback;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string LOW
SQL>
Values and meaning:
FALSE - Parallel rollback is disabled
LOW - Limits the maximum degree of parallelism to 2 * CPU_COUNT
HIGH -Limits the maximum degree of parallelism to 4 * CPU_COUNT
There are two modes of Transaction Recovery. These are :
a) Parallel Transaction Recovery
b) Serial Transaction Recovery
a) Parallel Transaction Recovery:
Recovery occurs in parallel mode. Several parallel slave processes will be spawned and will be involved in recovery. This is also termed as Fast Start Parallel Rollback. The background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes. To enable Parallel recovery mode, set the parameter FAST_START_PARALLEL_ROLLBACK to LOW / HIGH.
b) Serial Transaction Recovery
This mode recovers the transaction sequentially. Many of the times, serial transaction recovery will be faster. Setting the FAST_START_PARALLEL_ROLLBACK parameter to false will enable the serial transaction recovery.
4) Detection of Transaction Recovery
a) Identify Dead Transactions and their Sizes
b) Identify undo segments containing dead transactions
c) Identify the Time for transaction recovery to complete
d) Query to identify the number of parallel Recovery Slaves
e) Identify Objects involved in Transaction Recovery
a) Identify Dead Transactions and their Sizes
query:
select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%'
order by ktuxesiz asc;
e.g.,
SQL> select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%'
order by ktuxesiz asc;
USN SLOT SEQ STATE UNDO
---------- ---------- ---------- ---------------- ----------
13 24 84986 ACTIVE 1
14 6 871 ACTIVE 5
SQL>
Interpreting:
ktuxeusn – Undo Segment Number
ktuxeslt – Slot number
ktuxesqn – Sequence
ktuxesta – State
ktuxesiz – Undo Blocks Remaining
ktuxecfl – Flag
b) Identify undo segments containing dead transactions
query:
select useg.segment_name,
useg.segment_id,
useg.tablespace_name,
useg.status
from dba_rollback_segs useg
where useg.segment_id in
(select unique ktuxeusn
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%');
e.g.,
SQL> select useg.segment_name, useg.segment_id, useg.tablespace_name, useg.status
2 from dba_rollback_segs useg
3 where useg.segment_id in (select unique ktuxeusn
4 from x$ktuxe
5 where ktuxesta <> 'INACTIVE'
6 and ktuxecfl like '%DEAD%');
SEGMENT_NAME SEGMENT_ID TABLESPACE_NAME STATUS
------------------------------ ---------- ------------------------------ ----------------
_SYSSMU13_1849806892$ 13 UNDOTBS2 ONLINE
_SYSSMU14_3452166187$ 14 UNDOTBS2 ONLINE
2 rows selected
SQL>
c) Identify the Time for transaction recovery to complete
select usn,
state,
undoblockstotal "Total",
undoblocksdone "Done",
undoblockstotal - undoblocksdone "ToDo",
decode(cputime,
0,
'unknown',
sysdate + (((undoblockstotal - undoblocksdone) /
(undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;
d) Query to identify the number of parallel Recovery Slaves:
select * from v$fast_start_servers;
STATE UNDOBLOCKSDONE PID XID
----------- ---------------- ---------- ----------------
RECOVERING 12 10 001F005C00001BD6
RECOVERING 0 19 001F005C00001BD6
Column STATE shows the state of the server being IDLE or RECOVERING. If only 1 process is in state RECOVERING while the other processes are in state IDLE, then you should disable Parallel Transaction Recovery. Normally these queries will show the progress of the transaction recovery.
If there is no progress in the number of blocks, then we can consider disabling the parallel recovery and let the recovery continue in serial transaction.
e) Identify Objects involved in Transaction Recovery
select decode(px.qcinst_id,NULL,username,' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "Slave Set",
to_char(s.sid) "SID",
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
px.req_degree "Requested DOP",
px.degree "Actual DOP"
from gv$px_session px, gv$session s
where px.sid=s.sid (+)
and px.serial#=s.serial#
order by 5 , 1 desc ;
Username QC/Slave Slave Set SID QC SID Requested DOP Actual DOP
------------ ------------------ -------- --------- ------------- -----------
SYS QC 736 736
SQL> select sid,serial# from gv$session where sid=1445;
SID SERIAL#
--------- ---------
1445 13981
1445 28003
SQL> select distinct current_obj#
2 from v$active_session_history
3 where SESSION_ID=1445
4 and SESSION_SERIAL# in(13981,27943);
CURRENT_OBJ#
------------
88667
89381
87133
SQL>
SQL> select owner,Object_name, object_type
2 from dba_objects
3 where object_id in(17197,17310,17423);
OWNER OBJECT_NAME OBJECT_TYPE
----------------------------------------- ---------- -------
MMGT MATERIAL_REQUEST TABLE
MMGT PURCHAGE_ORDER TABLE
MMGT PUCHAGE_RECEIPTS TABLE
SQL>
So using These above method you can find out objects and other details related to what is in Parallel Transaction Recovery.
5) Need of Oracle support:
Oracle support required to provide undo dumps to analyze more and next level action plan:
a) Identify the UNDO name using the undo segment name and transaction details.
select sysdate, b.name useg, a.ktuxeusn xidusn, a.ktuxeslt xidslot, a.ktuxesqn xidsqn
from x$ktuxe a, undo$ b
where a.ktuxesta = 'ACTIVE'
and a.ktuxecfl like '%DEAD%'
and a.ktuxeusn = b.us#;
b) Dump the Undo Block
Alter system dump undo block ‘<undo seg name>’ xid <xidusn> <xidslot> <xidsqn>;
e.g.,
SQL> oradebug setmypid
Statement processed.
SQL> alter system dump undo block '_SYSSMU13_1849806892$' xid 13 24 84986 ;
System altered.
SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/prod/PROD2/trace/PROD2_ora_13959258.trc
SQL>
you can upload related trace file to oracle support for next level action plan.
6) Disable Transaction Recovery
- Disabling Transaction recovery could temporarily open up the database.
- SMON will stop performing all the transaction recovery.
- There is no way to stop only one Transaction recovery / few transaction recoveries.
- When the transaction recovery is disabled, any data modified by this dead transaction but required by other transactions will be recovered on "as needed" basis.
- We always suggest to disable the transaction recovery only with the help of Oracle Support.
Oracle does not recommend to Disable the Transaction recovery and open the Database for Business Use.
7) Precautions:
Check the long running Transactions before shutting down the Database
DBA and Application team to be aware of any long running Jobs.
Plan for the downtime and inform application team accordingly so that they do not schedule business critical / long running jobs.
Do not Panic. Transaction recovery is part of Database normal Operations
8) Oracle Bug and Workaround:
In the above observation, frequently three tables are commng for recovery. As per my experience I found this case in 2 production environments where no reason of this kind of transaction recovery. Recently Oracle agree with me it is a bug and they merged this bug with below bug id for Oracle 11.2.0.4+PSU7. Very soon Oracle will release a patch.
Bug 23217190
Bug details:
Bug 23217190 - TRANSACTION RECOVERY IS STUCK ON A 2 NODE RAC
=========================================================
"If you see the symptoms above, set event 10015 level 10 for SMON. If the
trace produced shows entries like this:
kturrt: skipped undo for xid: 0x02a4.021.00001850 @ uba uba:
0x00c00626.03b4.14
it is possibly this bug. Dump the undo for the transaction. For example
alter system dump undo block _SYSSMU675_3821756173$ xid 676 33 6224;
If the "Begin trans" record contains a "Global transid", then it is likely
this bug. The proxy_id names the sibling transaction which has been lost. "
Further review of Dev. team concluded Bug 23217190 is similar of Bug 13841869 and marked as Duplicate of ' Bug 13841869 '
++ Bug 13841869 - WHEN LOCAL TRANSACTION CONVERTS TO GLOBAL, FLAG KTUEXF_GLOBALTX IS NOT SET
Work-around:
==========
Drop and re-create the verified objects with planned downtime to fix this issue.
Note: All activities can be done at your own risk. It is highly recommended go with Oracle support to fix the issue.
Reference:
Oracle Documents