oracleops-support: 12/13/17

TRANSACTION RECOVERY is stuck on a 2 Node RAC | Bug 23217190 | Workaround

We will discuss the following to make clear about the issue and troubleshoot mechanism:

1) What is Transaction Recovery in Oracle?
2) Possible issues
3) Related parameter and modes of Transaction Recovery
4) Detection of Transaction Recovery
5) Need of Oracle support
6) Disable Transaction Recovery
7) Precautions:
8) Oracle 11.2.0.4 BUG and workaround

Details:

1) What is Transaction Recovery in Oracle?

When rolling back, all uncommitted transactions of a failed instance, Transaction Recovery is started by SMON. Some "in-progress" transactions that may not committed and Oracle needs to undo the same.
Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the crash or introduced by redo application during cache recovery. This normally happens during the Roll backward phase when the DB is restarted. Transaction Recovery can be performed by either the Server process which initiated the Transaction or the SMON process (in case the Server process is dead).

SMON process takes over the recovery when:

Server process is dead / crashed.
Instance itself is crashed

Undo blocks (whether in rollback segments or automatic undo tablespaces) record database actions that may need to be undone during certain database operations. In database recovery, the undo blocks roll back the effects of uncommitted transactions previously applied by the rolling forward phase.

After the roll forward, any changes that were not committed must be undone. Oracle applies undo blocks to roll back uncommitted changes in data blocks that were either written before the failure or introduced by redo application during cache recovery. This process is called rolling back or transaction recovery.

2) Possible issues

a) High CPU Utilization by SMON process
b) Database may hang during large transaction recovery.
c) If Database is shutdown abort, then the database may hang during consequent startup.
d) Database repeatedly crashes while open.
e) Cause huge volume archive-log generation

3) Related parameter and modes of Transaction Recovery

Parameter : fast_start_parallel_rollback

Default is:
SQL> show parameter fast_start_parallel_rollback;

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback string LOW
SQL>

Values and meaning:

FALSE - Parallel rollback is disabled
LOW - Limits the maximum degree of parallelism to 2 * CPU_COUNT
HIGH -Limits the maximum degree of parallelism to 4 * CPU_COUNT

There are two modes of Transaction Recovery. These are :
a) Parallel Transaction Recovery
b) Serial Transaction Recovery

a) Parallel Transaction Recovery:

Recovery occurs in parallel mode. Several parallel slave processes will be spawned and will be involved in recovery. This is also termed as Fast Start Parallel Rollback. The background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes. To enable Parallel recovery mode, set the parameter FAST_START_PARALLEL_ROLLBACK to LOW / HIGH.

b) Serial Transaction Recovery

This mode recovers the transaction sequentially. Many of the times, serial transaction recovery will be faster. Setting the FAST_START_PARALLEL_ROLLBACK parameter to false will enable the serial transaction recovery.

4) Detection of Transaction Recovery

a) Identify Dead Transactions and their Sizes
b) Identify undo segments containing dead transactions
c) Identify the Time for transaction recovery to complete
d) Query to identify the number of parallel Recovery Slaves
e) Identify Objects involved in Transaction Recovery

a) Identify Dead Transactions and their Sizes

query:

select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%'
order by ktuxesiz asc;

e.g.,

SQL> select ktuxeusn USN, ktuxeslt Slot, ktuxesqn Seq, ktuxesta State, ktuxesiz Undo
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%'
order by ktuxesiz asc;

USN SLOT SEQ STATE UNDO
---------- ---------- ---------- ---------------- ----------
13 24 84986 ACTIVE 1
14 6 871 ACTIVE 5

SQL>

Interpreting:

ktuxeusn – Undo Segment Number
ktuxeslt – Slot number
ktuxesqn – Sequence
ktuxesta – State
ktuxesiz – Undo Blocks Remaining
ktuxecfl – Flag

b) Identify undo segments containing dead transactions

query:

select useg.segment_name,
useg.segment_id,
useg.tablespace_name,
useg.status
from dba_rollback_segs useg
where useg.segment_id in
(select unique ktuxeusn
from x$ktuxe
where ktuxesta <> 'INACTIVE'
and ktuxecfl like '%DEAD%');
e.g.,

SQL> select useg.segment_name, useg.segment_id, useg.tablespace_name, useg.status
2 from dba_rollback_segs useg
3 where useg.segment_id in (select unique ktuxeusn
4 from x$ktuxe
5 where ktuxesta <> 'INACTIVE'
6 and ktuxecfl like '%DEAD%');

SEGMENT_NAME SEGMENT_ID TABLESPACE_NAME STATUS
------------------------------ ---------- ------------------------------ ----------------
_SYSSMU13_1849806892$ 13 UNDOTBS2 ONLINE
_SYSSMU14_3452166187$ 14 UNDOTBS2 ONLINE

2 rows selected

SQL>

c) Identify the Time for transaction recovery to complete

select usn,
state,
undoblockstotal "Total",
undoblocksdone "Done",
undoblockstotal - undoblocksdone "ToDo",
decode(cputime,
0,
'unknown',
sysdate + (((undoblockstotal - undoblocksdone) /
(undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;

d) Query to identify the number of parallel Recovery Slaves:

select * from v$fast_start_servers;

STATE UNDOBLOCKSDONE PID XID
----------- ---------------- ---------- ----------------
RECOVERING 12 10 001F005C00001BD6
RECOVERING 0 19 001F005C00001BD6

Column STATE shows the state of the server being IDLE or RECOVERING. If only 1 process is in state RECOVERING while the other processes are in state IDLE, then you should disable Parallel Transaction Recovery. Normally these queries will show the progress of the transaction recovery.

If there is no progress in the number of blocks, then we can consider disabling the parallel recovery and let the recovery continue in serial transaction.

e) Identify Objects involved in Transaction Recovery

select decode(px.qcinst_id,NULL,username,' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "Slave Set",
to_char(s.sid) "SID",
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
px.req_degree "Requested DOP",
px.degree "Actual DOP"
from gv$px_session px, gv$session s
where px.sid=s.sid (+)
and px.serial#=s.serial#
order by 5 , 1 desc ;

Username QC/Slave Slave Set SID QC SID Requested DOP Actual DOP
------------ ------------------ -------- --------- ------------- -----------
SYS QC 736 736

SQL> select sid,serial# from gv$session where sid=1445;

SID SERIAL#
--------- ---------
1445 13981
1445 28003

SQL> select distinct current_obj#
2 from v$active_session_history
3 where SESSION_ID=1445
4 and SESSION_SERIAL# in(13981,27943);

CURRENT_OBJ#
------------
88667
89381
87133

SQL>

SQL> select owner,Object_name, object_type
2 from dba_objects
3 where object_id in(17197,17310,17423);

OWNER OBJECT_NAME OBJECT_TYPE
----------------------------------------- ---------- -------
MMGT MATERIAL_REQUEST TABLE
MMGT PURCHAGE_ORDER TABLE
MMGT PUCHAGE_RECEIPTS TABLE

SQL>

So using These above method you can find out objects and other details related to what is in Parallel Transaction Recovery.

5) Need of Oracle support:

Oracle support required to provide undo dumps to analyze more and next level action plan:

a) Identify the UNDO name using the undo segment name and transaction details.

select sysdate, b.name useg, a.ktuxeusn xidusn, a.ktuxeslt xidslot, a.ktuxesqn xidsqn
from x$ktuxe a, undo$ b
where a.ktuxesta = 'ACTIVE'
and a.ktuxecfl like '%DEAD%'
and a.ktuxeusn = b.us#;

b) Dump the Undo Block

Alter system dump undo block ‘<undo seg name>’ xid <xidusn> <xidslot> <xidsqn>;

e.g.,

SQL> oradebug setmypid
Statement processed.
SQL> alter system dump undo block '_SYSSMU13_1849806892$' xid 13 24 84986 ;

System altered.

SQL> oradebug tracefile_name
/u01/app/oracle/diag/rdbms/prod/PROD2/trace/PROD2_ora_13959258.trc
SQL>

you can upload related trace file to oracle support for next level action plan.

6) Disable Transaction Recovery

Disabling Transaction recovery could temporarily open up the database.
SMON will stop performing all the transaction recovery.
There is no way to stop only one Transaction recovery / few transaction recoveries.
When the transaction recovery is disabled, any data modified by this dead transaction but required by other transactions will be recovered on "as needed" basis.
We always suggest to disable the transaction recovery only with the help of Oracle Support.

Caution:
Oracle does not recommend to Disable the Transaction recovery and open the Database for Business Use.

7) Precautions:

Check the long running Transactions before shutting down the Database
DBA and Application team to be aware of any long running Jobs.
Plan for the downtime and inform application team accordingly so that they do not schedule business critical / long running jobs.
Do not Panic. Transaction recovery is part of Database normal Operations

8) Oracle Bug and Workaround:

In the above observation, frequently three tables are commng for recovery. As per my experience I found this case in 2 production environments where no reason of this kind of transaction recovery. Recently Oracle agree with me it is a bug and they merged this bug with below bug id for Oracle 11.2.0.4+PSU7. Very soon Oracle will release a patch.

Bug 23217190

Bug details:

Bug 23217190 - TRANSACTION RECOVERY IS STUCK ON A 2 NODE RAC
=========================================================
"If you see the symptoms above, set event 10015 level 10 for SMON. If the
trace produced shows entries like this:
kturrt: skipped undo for xid: 0x02a4.021.00001850 @ uba uba:
0x00c00626.03b4.14
it is possibly this bug. Dump the undo for the transaction. For example
alter system dump undo block _SYSSMU675_3821756173$ xid 676 33 6224;
If the "Begin trans" record contains a "Global transid", then it is likely
this bug. The proxy_id names the sibling transaction which has been lost. "

Further review of Dev. team concluded Bug 23217190 is similar of Bug 13841869 and marked as Duplicate of ' Bug 13841869 '

++ Bug 13841869 - WHEN LOCAL TRANSACTION CONVERTS TO GLOBAL, FLAG KTUEXF_GLOBALTX IS NOT SET

Work-around:
==========

Drop and re-create the verified objects with planned downtime to fix this issue.

Note: All activities can be done at your own risk. It is highly recommended go with Oracle support to fix the issue.

Reference:
Oracle Documents

Recovery after killing a large transaction

Summary

After killing a large running transaction, either by killing the process or aborting the database, the database may appear to hang and/or the SMON background process is taking all the available cpu. Basically SMON is in the progress of rolling back this terminated large transaction, which can be time consuming. As a result of this users often try to restart the database, but because a shutdown immediate will hang, the database is often aborted. Please note that stopping the database will actually make things worse and will not decrease the amount of work SMON needs to perfom to complete the rollback.

The message 'Waiting for smon to disable tx recovery' will posted in the alert log as well.

First we need to determine the progress SMON is making. It is very possible the SMON process to work with one rollback segment. You can find it using the following query:

SELECT r.NAME "RB Segment Name", dba_seg.size_mb,
DECODE(TRUNC(SYSDATE - LOGON_TIME), 0, NULL, TRUNC(SYSDATE - LOGON_TIME) || ' Days' || ' + ') || 
TO_CHAR(TO_DATE(TRUNC(MOD(SYSDATE-LOGON_TIME,1) * 86400), 'SSSSS'), 'HH24:MI:SS') LOGON, 
v$session.SID, v$session.SERIAL#, p.SPID, v$session.process,
v$session.USERNAME, v$session.STATUS, v$session.OSUSER, v$session.MACHINE, 
v$session.PROGRAM, v$session.module, action 
FROM v$lock l, v$process p, v$rollname r, v$session, 
(SELECT segment_name, ROUND(bytes/(1024*1024),2) size_mb FROM dba_segments 
WHERE segment_type = 'TYPE2 UNDO' ORDER BY bytes DESC) dba_seg 
WHERE l.SID = p.pid(+) AND 
v$session.SID = l.SID AND 
TRUNC (l.id1(+)/65536)=r.usn AND 
l.TYPE(+) = 'TX' AND 
l.lmode(+) = 6 
AND r.NAME = dba_seg.segment_name
--AND v$session.username = 'SYSTEM'
--AND status = 'INACTIVE'
ORDER BY size_mb DESC;

After killing the shadow process or aborting the database the v$transaction entry is lost, so you cannot estimate by examining v$transaction.used_ublk how the rollback procedure proceeds.

The following queries are available to monitor the progress of the transaction recovery:

SQL> alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS'; 

SELECT usn, state, undoblockstotal "Total", undoblocksdone "Done", 
undoblockstotal-undoblocksdone "ToDo", 
DECODE(cputime,0,'unknown',SYSDATE+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) 
"Finish at" 
FROM v$fast_start_transactions; 

       USN STATE                 Total       Done       ToDo Finish at
---------- ---------------- ---------- ---------- ---------- --------------------
        88 RECOVERED             36591      36591          0 17-NOV-2008 17:02:44
        36 RECOVERING              464        100        364 10-DEC-2008 03:32:06
        71 RECOVERING              309        193        116 20-NOV-2008 19:19:23
         8 RECOVERING          2350785     189363    2161422 18-NOV-2008 00:51:51

Run the above query several times in a row, this will give you a good idea on how SMON is progressing.

- In some versions the cputime does not work (always 0), hence the estimated completion time will not be displayed
- In some cases the v$fast_start_transactions view will not work. If this is the case then you can query the internal data dictionary
view x$ktuxe, the 'ktuxesiz' column represents the remaining number of undo blocks required for rollback:

   select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta
   from x$ktuxe
   where ktuxecfl = 'DEAD';

Database Hangs Because SMON is taking 100% CPU doing transaction recovery

In this Post,
  Goal
  Solution
  References

--------------------------------------------------------------------------------


This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) Rapid Visibility (RaV) process, and therefore has not been subject to an independent technical review. 



Applies to: Oracle Server - Enterprise Edition - Version: 8.1.7 to 10.2
Information in this document applies to any platform.
GoalAfter killing a large running transaction, either by killing the process or aborting the database, the database may appear to hang 
and/or the SMON background process is taking all the available cpu. Basically SMON is in the progress of rolling back this terminated 
large transaction, which can be time consuming. As a result of this users often try to restart the database, but because a shutdown 
immediate will hang, the database is often aborted. Please note that stopping the database will actually make things worse and will not 
decrease the amount of work SMON needs to perfom to complete the rollback. 

The message 'Waiting for smon to disable tx recovery' will posted in the alert log as well. 




SolutionFirst we need to determine the progress SMON is making. After killing the shadow process or aborting the database the v$transaction entry is lost, so you cannot estimate by examining v$transaction.used_ublk how the rollback procedure proceeds. 

The following queries are available to monitor the progress of the transaction recovery 

  set linesize 100 
  alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS'; 
  select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo", 
             decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) 
              "Estimated time to complete" 
   from v$fast_start_transactions; 

Run the above query several times in a row, this will give you a good idea on how SMON is progressing. 

- In some versions the cputime does not work (always 0), hence the estimated completion time will not be displayed 
- In some cases the v$fast_start_transactions view will not work. If this is the case then you can query the internal data dictionary 
view x$ktuxe, the 'ktuxesiz' column represents the remaining number of undo blocks required for rollback: 

   select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta 
   from x$ktuxe 
   where ktuxecfl = 'DEAD'; 



The following outlines a few action that can be taken in order to speed up SMON recovery of the large transaction. 

1. There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the pq slaves are 
interfering with each other. To check the Parallel Recovery processes and there state query: 

          select * from v$fast_start_servers; 

    Column STATE shows the state of the server being IDLE or RECOVERING, if only 1 process is in 
    state RECOVERING while the other processes are in state IDLE, then you should disable 
    Parallel Transaction Recovery. How to do this is outlined in the following note: 

         Note 238507.1: How to Disable Parallel Transaction Recovery When Parallel Txn Re very is Active 

    If all the processes are in state RECOVERING, then you can benefit from adding more processes: 

         alter system set fast_start_parallel_rollback = high; 

    This will create parallel servers as much as 4 times the number of CPUs. 


2. Increase the parameter '_cleanup_rollback_entries' to 400. The default is 100. 

This parameter cannot be changed dynamically, so in order to change this the database will need to be restarted. As outlined above 
this is not advisable in the current situation, but there is basically no alternative. 
If the database is is going to be restarted, then you best also specify 

  fast_start_parallel_rollback=false 


3. As a last resort, if the above has been verified, we can disable SMON from performing the recovery at this time. 
This will allow the database to operate as normal, and you can choose to let the recovery take place at a later time when it is more 
convenient. Because the clean-up of the large transaction is actually put on hold it is important, and I cannot stress this enough, to 
disable this event again to let SMON finish its work. 

  init.ora: 
    event="10513 trace name context forever, level 2" 


Subject:  How to Disable Parallel Transaction Recovery When Parallel Txn Recovery is Active 
  Doc ID:  Note:238507.1 Type:  PROBLEM 
  Last Revision Date:  24-APR-2004 Status:  PUBLISHED 


Symptom(s)
~~~~~~~~~~

Parallel Transaction Recovery is taking too long.

You can use V$TRANSACTION USED_UBLK to estimate how long the rollback is going 
to take but there is no formula for this.  If you shutdown the database after 
rollback has started, it will begin where it left off.  

You can also look at V$FAST_START_TRANSACTIONS for UNDOBLOCKSDONE 
versus UNDOBLOCKSTOTAL.



Change(s)
~~~~~~~~~~

A large transaction got killed or rolled back.


Cause
~~~~~~~

There are cases where parallel transaction recovery is not as fast as serial 
transaction recovery, because the pq slaves are interfering with each other.  
This depends mainly on the type of changes that need to be made during rollback 
and usually may happen when rolling back INDEX Updates in parallel.   


Fix
~~~~

Dynamically switch from parallel recovery to serial.  If you are in a clustered 
environment you will need to do this on all instances at the same time:

1. Find SMON's Oracle PID:

Example:

SQL> select pid, program from v$process where program like '%SMON%';

       PID PROGRAM
---------- ------------------------------------------------
         6 oracle@stsun7 (SMON)

2. Disable SMON transaction cleanup:

SVRMGR> oradebug setorapid 
SVRMGR> oradebug event 10513 trace name context forever, level 2

3. Kill the PQ slaves that are doing parallel transaction recovery. 
You can check V$FAST_START_SERVERS to find these.

4. Turn off fast_start_parallel_rollback:

 alter system set fast_start_parallel_rollback=false;

If SMON is recovering, this command might hang, if it does just control-C out of it.  You may need to try this many times to get this to complete (between SMON cycles).

5. Re-enable SMON txn recovery:

SVRMGR> oradebug setorapid 
SVRMGR> oradebug event 10513 trace name context off


References
~~~~~~~~~~~
Note 144332.1 
Parallel Rollback may hang database, Parallel query servers get 100% cpu

Hidden parameter _db_block_cache_protect

Question: What does the hidden parameter _db_block_cache_protect do?
Answer: The _db_block_cache_protect parameter protects database blocks. If _db_block_cache_protect = true, then when a process needs to access a block in the buffer, it needs to make a special call to enable the access then disable it afterwards.

The _db_block_cache_protect can add overhead and can harm system performance.

Oracle uses a special database algorithm to protect all modified blocks in the database memory. Any uncontrolled database writes inside the database memory area will be detected, protecting the disk from getting in memory corrupted blocks. After every successful disk read, the database block is checked in the database memory. The checksum information from the block header is compared with the memory checksum calculated on the fly. When they do not match, Oracle raises an error indicating a soft corrupted block, which will be reformatted and reused again.

Although in memory corruption detection is helpful, any in the memory corruption will crash the database instance. This parameter is hidden and by default always on.

Find out the JOB Broken:

Find out the JOB Broken:
========================

set lines 200
set pages 20
col interval format a15
col what format a80
col broken format a10
SELECT JOB,LAST_DATE,NEXT_DATE,INTERVAL,BROKEN,FAILURES,WHAT FROM DBA_JOBS WHERE BROKEN <> 'N' OR FAILURES <> 0;

privileg to manage scheduler jobs:
================================

grant manage scheduler to <username>;

Run job:
========
exec sys.dbms_ijob.run(20);

Stop Job:
======
exec sys.DBMS_SCHEDULER.STOP_JOB('BIODATA.BIO_CREATE_SA_MVS',TRUE);

disable scheduler :
==============

exec dbms_scheduler.disable('CMMMERCK.JOB_CMM_ERRORCHECK');

REMOVE JOB:
===========
exec sys.dbms_ijob.remove(251);

to unbreak:
===========
exec sys.DBMS_IJOB.BROKEN(5176,FALSE);

to break:
==========
exec sys.DBMS_IJOB.BROKEN(110,TRUE);

Find running sql

To check running sql : active
+++++++++++++++++++++

SELECT
SUBSTR(SS.USERNAME,1,8) USERNAME,
SS.OSUSER "USER",
AR.MODULE || ' @ ' || SS.machine CLIENT,
SS.PROCESS PID,
TO_CHAR(AR.LAST_LOAD_TIME, 'DD-Mon HH24:MM:SS') LOAD_TIME,
AR.DISK_READS DISK_READS,
AR.BUFFER_GETS BUFFER_GETS,
SUBSTR(SS.LOCKWAIT,1,10) LOCKWAIT,
W.EVENT EVENT,
SS.status,
AR.SQL_fullTEXT SQL
FROM V$SESSION_WAIT W,
V$SQLAREA AR,
V$SESSION SS,
v$timer T
WHERE SS.SQL_ADDRESS = AR.ADDRESS
AND SS.SQL_HASH_VALUE = AR.HASH_VALUE
AND SS.SID = w.SID (+)
AND ss.STATUS = 'ACTIVE'
AND W.EVENT != 'client message'
ORDER BY SS.LOCKWAIT ASC, SS.USERNAME, AR.DISK_READS DESC;

Scripts related to TEMP Tablespace

Scripts related to TEMP Tablespace

To see Temp Space:
-------------------

SELECT a.tablespace_name,ROUND((c.total_blocks*b.block_size)/1024/1024/1024,2)
"Total Size [GB]",ROUND((a.used_blocks*b.block_size)/1024/1024/1024,2) "Used_size[GB]",
ROUND(((c.total_blocks-a.used_blocks)*b.block_size)/1024/1024/1024,2) "Free_size[GB]",
ROUND((a.max_blocks*b.block_size)/1024/1024/1024,2) "Max_Size_Ever_Used[GB]",
ROUND((a.max_used_blocks*b.block_size)/1024/1024/1024,2) "MaxSize_ever_Used_by_Sorts[GB]" ,
ROUND((a.used_blocks/c.total_blocks)*100,2) "Used Percentage"
FROM V$sort_segment a,dba_tablespaces b,(SELECT tablespace_name,SUM(blocks)
total_blocks FROM dba_temp_files GROUP by tablespace_name) c
WHERE a.tablespace_name=b.tablespace_name AND a.tablespace_name=c.tablespace_name;

=================================================================
To see top 10 consuming process :
---------------------------------------

select * from
(SELECT d.tablespace_name,a.sid,a.serial#,a.program,a.module,a.action,a.username "DB Username",a.osuser,ROUND((b.blocks*d.block_size)/1024/1024,2) "Used MB",c.sql_text
FROM v$session a, v$tempseg_usage b, v$sqlarea c,dba_tablespaces d
WHERE a.saddr = b.session_addr AND c.address= a.sql_address AND c.hash_value = a.sql_hash_value AND d.tablespace_name=b.tablespace ORDER BY b.tablespace, b.blocks DESC)
where rownum <=10

===============================================================

Query to check TEMP USAGE :
------------------------------------

col name for a20
SELECT d.status "Status", d.tablespace_name "Name", d.contents "Type", d.extent_management
"ExtManag",
TO_CHAR(NVL(a.bytes / 1024 / 1024, 0),'99,999,990.900') "Size (M)", TO_CHAR(NVL(t.bytes,
0)/1024/1024,'99999,999.999') ||'/'||TO_CHAR(NVL(a.bytes/1024/1024, 0),'99999,999.999') "Used (M)",
TO_CHAR(NVL(t.bytes / a.bytes * 100, 0), '990.00') "Used %"
FROM sys.dba_tablespaces d, (select tablespace_name, sum(bytes) bytes from dba_temp_files group by
tablespace_name) a,
(select tablespace_name, sum(bytes_cached) bytes from
v$temp_extent_pool group by tablespace_name) t
WHERE d.tablespace_name = a.tablespace_name(+) AND d.tablespace_name = t.tablespace_name(+)
AND d.extent_management like 'LOCAL' AND d.contents like 'TEMPORARY';

====================================================================

To Check Percentage Usage of Temp Tablespace:
---------------------------------------------------------

select (s.tot_used_blocks/f.total_blocks)*100 as "percent used"
from (select sum(used_blocks) tot_used_blocks
from v$sort_segment where tablespace_name='TEMP') s,
(select sum(blocks) total_blocks
from dba_temp_files where tablespace_name='TEMP') f;

====================================================================
To find Sort Segment Usage by a particular User:
----------------------------------------------------------

SELECT s.username,s.sid,s.serial#,u.tablespace, u.contents, u.extents, u.blocks
FROM v$session s, v$sort_usage u
WHERE s.saddr=u.session_addr
order by u.blocks desc;

====================================================================

To find Total Free space in Temp Tablespace :
---------------------------------------------

select tablespace_name , (free_blocks*8)/1024/1024 FreeSpaceInGB,
(used_blocks*8)/1024/1024 UsedSpaceInGB,
(total_blocks*8)/1024/1024 TotalSpaceInGB
from v$sort_segment where tablespace_name like '%TEMP%';

=====================================================================

Get 10 sessions with largest temp usage :
---------------------------------------------

cursor bigtemp_sids is
select * from (
select s.sid,
s.status,
s.sql_hash_value sesshash,
u.SQLHASH sorthash,
s.username,
u.tablespace,
sum(u.blocks*p.value/1024/1024) mbused ,
sum(u.extents) noexts,
nvl(s.module,s.program) proginfo,
floor(last_call_et/3600)||':'||
floor(mod(last_call_et,3600)/60)||':'||
mod(mod(last_call_et,3600),60) lastcallet
from v$sort_usage u,
v$session s,
v$parameter p
where u.session_addr = s.saddr
and p.name = 'db_block_size'
group by s.sid,s.status,s.sql_hash_value,u.sqlhash,s.username,u.tablespace,
nvl(s.module,s.program),
floor(last_call_et/3600)||':'||
floor(mod(last_call_et,3600)/60)||':'||
mod(mod(last_call_et,3600),60)
order by 7 desc,3)
where rownum < 11;

====================================================================

Identifying WHO is currently using TEMP Segments :
--------------------------------------------------------------

SELECT sysdate,a.username, a.sid, a.serial#, a.osuser, (b.blocks*d.block_size)/1048576 MB_used, c.sql_text
FROM v$session a, v$tempseg_usage b, v$sqlarea c,
(select block_size from dba_tablespaces where tablespace_name='TEMP') d
WHERE b.tablespace = 'TEMP'
and a.saddr = b.session_addr
AND c.address= a.sql_address
AND c.hash_value = a.sql_hash_value
AND (b.blocks*d.block_size)/1048576 > 1024
ORDER BY b.tablespace, 6 desc;

==============================================================================================================================================

db health check .sh

$ cat health_check.sh

! echo ***INSTANCE STATUS***! echo ***INSTANCE STATUS***
set lines 300
col HOST_NAME for a25
select INSTANCE_NAME,HOST_NAME,STATUS,ACTIVE_STATE,LOGINS from gv$instance;

! echo ***LIST OF INACTIVE SESSIONS***
select inst_id, username,count(1) from gv$session where status='INACTIVE' group by inst_id, username order by 1;

! echo ***LOAD BALANCING***
select count(*),inst_id from gv$process group by inst_id;

! echo ****PROCESS USAGE****
set lines 300
col RESOURCE_NAME for a10
col INITIAL_ALLOCATION for a15
select * from gv$resource_limit where resource_name='processes';

! echo ****SESSION USAGE****
select * from gv$resource_limit where resource_name='sessions';

! echo ***BLOCKING SESSIONS***
select 'SID ' || l1.sid ||' is blocking ' || l2.sid blocking
from gv$lock l1, gv$lock l2
where l1.block =1 and l2.request > 0
and l1.id1=l2.id1
and l1.id2=l2.id2
/

! echo ***TABLE LOCKS***
set lines 100 pages 999
col username format a20
col sess_id format a10
col object format a25
col mode_held format a10
select oracle_username || ' (' || s.osuser || ')' username
, s.sid || ',' || s.serial# sess_id
, owner || '.' || object_name object
, object_type
, decode( l.block
, 0, 'Not Blocking'
, 1, 'Blocking'
, 2, 'Global') status
, decode(v.locked_mode
, 0, 'None'
, 1, 'Null'
, 2, 'Row-S (SS)'
, 3, 'Row-X (SX)'
, 4, 'Share'
, 5, 'S/Row-X (SSX)'
, 6, 'Exclusive', TO_CHAR(lmode)) mode_held
from gv$locked_object v
, dba_objects d
, gv$lock l
, gv$session s
where v.object_id = d.object_id
and v.object_id = l.id1
and v.session_id = s.sid
order by oracle_username
, session_id
/

! echo ***TABLESPACE USAGE***
set pages 999
col tablespace_name format a40
col "size MB" format 999,999,999
col "free MB" format 99,999,999
col "% Used" format 999
select tsu.tablespace_name, ceil(tsu.used_mb) "size MB"
, decode(ceil(tsf.free_mb), NULL,0,ceil(tsf.free_mb)) "free MB"
, decode(100 - ceil(tsf.free_mb/tsu.used_mb*100), NULL, 100,
100 - ceil(tsf.free_mb/tsu.used_mb*100)) "% used"
from (select tablespace_name, sum(bytes)/1024/1024 used_mb
from dba_data_files group by tablespace_name union all
select tablespace_name || ' **TEMP**'
, sum(bytes)/1024/1024 used_mb
from dba_temp_files group by tablespace_name) tsu
, (select tablespace_name, sum(bytes)/1024/1024 free_mb
from dba_free_space group by tablespace_name) tsf
where tsu.tablespace_name = tsf.tablespace_name (+)
order by 4
/

! echo ***ARCHIVE USAGE***

show parameter recovery;

select NAME,TOTAL_MB,FREE_MB,round((FREE_MB*100)/(TOTAL_MB)) PERCENT_FREE from V$ASM_DISKGROUP
/

Archive/Redo Log generation

Redo Log Generation

See how much redo logs you are generating per day. Good to use when you see your archivelog directories are filling up unusually.

This will not show you where the problem is though, will just give you the number of log switches and average MB generated per day so that you can compare day by day. If you do start seeing unusually high log switches, thus high volume of archivelog generation, maybe a good place to start is looking for transactions which do lots of UPDATE/DELETE operations in the STATSPACK/AWR.

Archive generation:

SELECT TO_CHAR(FIRST_TIME,'YYYY-MM-DD') DAY,
TO_CHAR(FIRST_TIME,'HH24') HOUR,
COUNT(*) TOTAL
FROM V$LOG_HISTORY
GROUP BY TO_CHAR(FIRST_TIME,'YYYY-MM-DD'),TO_CHAR(FIRST_TIME,'HH24')
ORDER BY TO_CHAR(FIRST_TIME,'YYYY-MM-DD'),TO_CHAR(FIRST_TIME,'HH24')
ASC;

DAILY ARCHIVE LOG GENERATION IN GB :

Hourly/Daily Archive generation
The below query comes handy to understand the archivelog generation of an Oracle database on an hourly /daily basis, per thread – in case of RAC databases.

Archivelog generation on a daily basis:

1
2
3
set pages 1000
select trunc(COMPLETION_TIME,'DD') Day, thread#, round(sum(BLOCKS*BLOCK_SIZE)/1048576) MB,count(*) Archives_Generated from v$archived_log
group by trunc(COMPLETION_TIME,'DD'),thread# order by 1;
Archive log generation on an hourly basis:

1
2
3
set pages 1000
select trunc(COMPLETION_TIME,'HH') Hour,thread# , round(sum(BLOCKS*BLOCK_SIZE)/1048576) MB,count(*) Archives from v$archived_log
group by trunc(COMPLETION_TIME,'HH'),thread# order by 1 ;
Sample output:

1
2
3
4
5
6
7
8
9
HOUR THREAD# MB ARCHIVES
------------------- ---------- ---------- ----------
2009 08 20 12:00:00 1 31268 339
2009 08 20 13:00:00 1 4994 55
2009 08 20 14:00:00 1 4412 48
2009 08 20 15:00:00 1 4805 52
2009 08 20 16:00:00 1 3364 37
2009 08 20 17:00:00 1 22 1
2009 08 20 21:00:00 1 9 1

Also, the following script is useful to find the archivelog switches on an hourly basis that happened in the past one week.

SELECT to_date(first_time) DAY,
to_char(sum(decode(to_char(first_time,'HH24'),'00',1,0)),'99') "00",
to_char(sum(decode(to_char(first_time,'HH24'),'01',1,0)),'99') "01",
to_char(sum(decode(to_char(first_time,'HH24'),'02',1,0)),'99') "02",
to_char(sum(decode(to_char(first_time,'HH24'),'03',1,0)),'99') "03",
to_char(sum(decode(to_char(first_time,'HH24'),'04',1,0)),'99') "04",
to_char(sum(decode(to_char(first_time,'HH24'),'05',1,0)),'99') "05",
to_char(sum(decode(to_char(first_time,'HH24'),'06',1,0)),'99') "06",
to_char(sum(decode(to_char(first_time,'HH24'),'07',1,0)),'99') "07",
to_char(sum(decode(to_char(first_time,'HH24'),'08',1,0)),'99') "08",
to_char(sum(decode(to_char(first_time,'HH24'),'09',1,0)),'99') "09",
to_char(sum(decode(to_char(first_time,'HH24'),'10',1,0)),'99') "10",
to_char(sum(decode(to_char(first_time,'HH24'),'11',1,0)),'99') "11",
to_char(sum(decode(to_char(first_time,'HH24'),'12',1,0)),'99') "12",
to_char(sum(decode(to_char(first_time,'HH24'),'13',1,0)),'99') "13",
to_char(sum(decode(to_char(first_time,'HH24'),'14',1,0)),'99') "14",
to_char(sum(decode(to_char(first_time,'HH24'),'15',1,0)),'99') "15",
to_char(sum(decode(to_char(first_time,'HH24'),'16',1,0)),'99') "16",
to_char(sum(decode(to_char(first_time,'HH24'),'17',1,0)),'99') "17",
to_char(sum(decode(to_char(first_time,'HH24'),'18',1,0)),'99') "18",
to_char(sum(decode(to_char(first_time,'HH24'),'19',1,0)),'99') "19",
to_char(sum(decode(to_char(first_time,'HH24'),'20',1,0)),'99') "20",
to_char(sum(decode(to_char(first_time,'HH24'),'21',1,0)),'99') "21",
to_char(sum(decode(to_char(first_time,'HH24'),'22',1,0)),'99') "22",
to_char(sum(decode(to_char(first_time,'HH24'),'23',1,0)),'99') "23"
from
v$log_history
where to_date(first_time) > sysdate -8
GROUP by
to_char(first_time,'YYYY-MON-DD'), to_date(first_time)
order by to_date(first_time)
/

Oracle trace event

Setting an Oracle trace event

Another set of parameters that may be useful are events, so I?ll discuss events a bit in this subsection. Setting an event means to tell Oracle to generate information in form of a trace file in the context of the event. The trace file is usually located in a directory specified by the initialization parameter USER_DUMP_DEST. By examining the resulting trace file, detailed information about the event traced can be deduced. The general format for an event is:

EVENTS = "<trace class><event name><action><name><trace name><qualifier>"

There are two types of events: session-events and process-events. Process-events are initialized in the parameter file; session-events are initialized with the ALTER SESSION... or ALTER SYSTEM command. When checking for posted events, the Oracle Server first checks for session-events then for process-events.

Oracle trace Event Classes

There are four traceable event classes:

Class 1: 'Dump something.' Traces are generated upon so-called unconditioned immediate, events. This is the case when Oracle data has to be dumped; for example , the headers of all redolog files or the contents of the controlfile. These events can not be set in the init<SID>.ora, but must be set using the ALTER SESSION or the DBMS_SESSION.SET_EV() procedure.
Class 2: 'Trap on Error.' Setting this class of (error-) events causes Oracle to generate an errorstack every time the event occurs.
Class 3: 'Change execution path.' Setting such an event will cause Oracle to change the execution path for some specific Oracle internal code segment. For example, setting event "10269" prevents SMON from doing free-space coalescing.
Class 4: 'Trace something.' Events from this class are set to obtain traces that are used for, for example, SQL tuning. A common event is "10046", which will cause Oracle to trace the SQL access path on each SQL-statement.

The "set events" trace command settings

The SET EVENTS command in an init<SID>.ora file have generally been placed there at the command of Oracle support to perform specific functions. Usually, these alerts turn on more advanced levels of tracing and error detection than are commonly available. Source 2.6 lists some of the more common events.

The syntax to specify multiple events in the init.ora is:

EVENT="<event 1>:<event 2>: <event 3>: <event n>"

You can also split the events on multiple lines by using the continuation backslash character (\) at the end of each event and continue the next event on the next line. For example:

    EVENT="<event 1>:\
    <event 2>:\
    <event 3>: \
    <event n>"

For Example:

    EVENT="\
    10210 trace name context forever, level 10:\
    10211 trace name context forever, level 10:\
    10231 trace name context forever, level 10:\
    10232 trace name context forever, level 10"After setting the events in the initialization file, you need to stop and restart the instance. Be sure to check the alert.log and verify that the events are in effect. You can specify almost all EVENT settings at the session level using the ALTER SESSION command or a call to the DBMS_SYSYTEM.SET_EV( ) procedure; doing so does not require an instance bounce for the EVENT to take effect.

The alert.log should show the events that are in effect; for example:

event = 10210 trace name context forever, level 10:10211 trace name context for ever, level 10:10231 trace name context forever, level 10:10232 trace name context forever, level 10

Example Uses of the EVENT Initialization Parameter

To enable block header and trailer checking to detect corrupt blocks:

event="10210 trace name context forever, level 10" -- for tables
event="10211 trace name context forever, level 10" -- for indexes
event="10210 trace name context forever, level 2" -- data block checking
event="10211 trace name context forever, level 2" -- index block checking
event="10235 trace name context forever, level 1" -- memory heap checking
event="10049 trace name context forever, level 2" -- memory protect cursors

And to go with these, the undocumented parameter setting:

_db_block_cache_protect=TRUE

which will prevent corruption from getting to your disks (at the cost of a database crash).

For tracing of a MAX_CURSORS exceeded error:

event="1000 trace name ERRORSTACK level 3"

To get an error stack related to a SQLNET ORA-03120 error:

event="3120 trace name error stack"

To work around a space leak problem:

event="10262 trace name context forever, level x"

where x is the size of space leak to ignore.

To trace memory shortages:

event="10235 trace name context forever, level 4"

event="600 trace name heapdump, level 4"

To take a shared pool heapdump to track Ora-04031 as the error occurs, set the following event in your init.ora file:

event = "4031 trace name heapdump forever, level 2"

For ORA-04030 errors: Take a dump by setting this event in your INIT file and analyze the trace file. This will clearly pinpoint the problem.

event="4030 trace name errorstack level 3"

The following undocumented SQL statements can be used to obtain information about internal database structures:

* To dump the control file:

alter session set events 'immediate trace name CONTROLF level 10'

* To dump the file headers:

alter session set events 'immediate trace name FILE_HDRS level 10'

* To dump redo log headers:

alter session set events 'immediate trace name REDOHDR level 10'

* To dump the system state:

alter session set events 'immediate trace name SYSTEMSTATE level 10'

* To dump the optimizer statistics whenever a SQL statement is parsed:

alter session set events '10053 trace name context forever'

* To prevent db block corruptions:

event = "10210 trace name context forever, level 10"

event = "10211 trace name context forever, level 10"

event = "10231 trace name context forever, level 10"

* To enable the maximum level of SQL performance monitoring:

event = "10046 trace name context forever, level 12"

* To enable a memory-protect cursor:

event = "10049 trace name context forever, level 2"

* To perform data-block checks:

event = "10210 trace name context forever, level 2"

* To perform index-block checks:

event = "10211 trace name context forever, level 2"

* To perform memory-heap checks:

event = "10235 trace name context forever, level 1"

* To allow 300 bytes memory leak for each connection:

event = "10262 trace name context forever, level 300"

You should be noticing a pattern here for tracing events related to error codes: the first argument in the EVENT is the error code followed by the action you want to take upon receiving the code.

Events at the Session Level

Events are also used as the SESSION level using the ALTER SESSION command or calls to the DBMS_SYSTEM.SET_EV() procedure. The general format for the ALTER SESSION command is:

ALTER SESSION SET EVENTS 'ev_number ev_text level x';

where:

Ev_number is the event number.

Ev_text is any required text (usually "trace name context forever").

x is the required level setting corresponding to the desire action, file, or
other required data.

For example, to provide more detailed SQL trace information:

ALTER SESSION SET EVENTS '10046 trace name context forever level NN'

where NN:

1 is same as a regular trace.

4 means also dump bind variables

8 means also dump wait information

12 means dump both bind and wait information

Example Uses of the ALTER SESSION Command to Set EVENT Codes

To coalesce free space in a tablespace pre-version 7.3:

ALTER SESSION SET EVENTS 'immediate trace name coalesce level XX'

where:

XX is the value of ts# from ts$ table for the tablespace.

To coalesce free space in a tablespace defined as temporary:

ALTER SESSION SET EVENTS 'immediate trace name drop_segments level &x';

where:

x is the value for file# from ts$ plus 1.

To get the information out of the db block buffers regarding order of LRU chains:

ALTER SESSION SET EVENTS 'immediate trace name buffers level x';

where:

x is 1-3 for buffer header order or 4-6 for LRU chain order.

To correct transportable tablespace export hanging (reported on 8.1.6, 8.1.7 on HPUX, a known bug):

ALTER SESSION SET EVENT '10297 trace name context forever, level 1';

To cause "QKA Disable GBY sort elimination". This affects how Oracle will process sorts:

ALTER SESSION SET EVENTS '10119 trace name context forever';

* You can disable the Index FFS using the event 10156. In this case, CBO will lean toward FTS or Index scan.

* You can set the event 10092 if you want to disable the hash joins completely.

It is very easy to see how SMON cleans up rollback entries by using the event 10015. You can use event 10235 to check how the memory manager works internally.

CBO is definitely not a mystery. Use event 10053 to give the detail of the various plans considered, depending on the statistics available; be careful using this for large multi-table joins, as the report can be quite lengthy! The data density, sparse characteristics, index availability, and index depth all lead the optimizer to make its decisions. You can see the running commentary in trace files generated by the 10053 event.

Wikipedia

We will discuss the following to make clear about the issue and troubleshoot mechanism:

Setting an Oracle trace event

Oracle trace Event Classes

The "set events" trace command settings

Example Uses of the EVENT Initialization Parameter

Example Uses of the ALTER SESSION Command to Set EVENT Codes