Tuesday, 7 December 2010
Key points to consider for Oracle RAC/Cluster in Linux
Reference
Oracle Bug # 8429716
Oracle Metalink Note id # 858279.1
You may see timeout errors like below in the crsd.log file.
Important : Oracle Metalink note id # 732086.1 & Bug 6196746 may misguide you.
2010-12-07 03:14:54.776: [ CRSEVT][1494178112] CAAMonitorHandler :: 0:Action Script /u01/crs/oracle/product/crs/bin/racgwrap(check) timed out for
ora.dlp34db.vip! (timeout=60)
2010-12-07 03:14:54.776: [ CRSAPP][1494178112] CheckResource error for ora.dlp34db.vip error code = -2
2010-12-07 03:14:54.846: [ CRSEVT][1492076864] CAAMonitorHandler :: 0:Could not join /u01/crs/oracle/product/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child
2. Increase DIAGWAIT. This avoids failure in dumping log information prior to Node Eviction by giving sufficient time.
Eventually it helps to troubleshoot the root cause of Node Eviction
Reference
Oracle Metalink Note id # 559365.1
crsctl set css diagwait 13 -force
3. Ensure HUGEPAGE is set appropriately.
Oracle Metalink Note Reference
Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration [ID 401749.1]
HugePages on 64-bit Linux [ID 361468.1]
HugePages on Linux: What It Is... and What It Is Not... [ID 361323.1]
4. Additional settings to fetch server level core information to diagnose the Server crash due to Oracle Cluster
a. Enable CSSD core by following Oracle Metalink Note 605449.1
b. Configure kdump by following Oracle Metalink Note 452067.1
SQLID level Tracing Feature in 11g
set of SQL IDs
For example
alter session/system set events 'sql_trace [sql:g3yc1js3g2689|7ujay4u33g337] wait=true | false, bind=true | false, planstat=never | first_execution | all_executionss|level 12';
alter session/system set events 'sql_trace [sql:g3yc1js3g2689|7ujay4u33g337] off';
Trace will be generated upon running that particular SQL ID
Keywords : SQL ID, sqlid trace
Monday, 22 February 2010
Solution for long running query against DBA_EXTENTS to find the corrupted segment
You are running a query against DBA_EXTENTS to find the segment information (most probably you would be using
this to find a CORRUPTED segment/object) for a particular FILE_ID and BLOCK_ID.
Query might look something like this.
select owner,segment_name,segment_type
from dba_extents
where file_id =
and
This query might run longer and sometimes you may lose your patience waiting for it.
Solution
- Dump the block and take the object id using the below command
alter system dump datafile
Eg.alter system dump datafile 1811 block 20000;
FYI, this command generate a trace file in the user_dump_dest directory
- Grep the object number from the trace file.
Locate the trace file and grep the object id which will be presented in HEX value.
grep "seg/obj" op01csdb01_ora_13143.trc
seg/obj: 0xe024a2 csc: 0x878.a5583a53 itc: 50 flg: E typ: 1 - DATA
- Find the object information from DBA_OBJECTS
select owner,object_id,data_object_id,object_name,subobject_name
from dba_objects
where (object_id=to_number('E024A2', 'xxxxxx')
or
data_object_id=to_number('E024A2', 'xxxxxx')
);
Note : the count of 'x' should be equal to the number of characters in the source.
Eg. E024A2 - is 6 character string, hence i used 'xxxxxx'
Keywords : query; dba_extents; long; slow; corrupt ; segment; object
Monday, 25 January 2010
How to create duplicate database from a primary when the FULL backup happens out of standby
1. You want to create a DUPLICATE database to perform PITR out of Primary PRODUCTION database. Duplicate/auxiliary instance is created in a different host.
2. FULL/DYNAMIC backup of primary production database goes out of STANDBY
Assumption
1. Backup of Production databae controlfile exists
Facts
In this scenario, I used the following command to perform PITR recovery in RETEK database.
1. Production database
a. op01rtdb01/2
b. primnode1/primnode2
2. Standby database
a. Op99rtdb01
b. standbynode1
c. Policy : rdc_oradb_qa
d. Netbackup master server : nbmstrnode1
Note : FULL database backup is performed out of op99rtdb01
3. Duplicate/Auxiliary database
a. op01drdb01
b. auxnode1
Commands
PFA the general commands for this sort PITR. This document would also cover the directory creation and other steps.
Otherwise, the key steps are
1. Login to auxnode1 as “oracle” unix user
2. . oraenv (type op01drdb01)
3. Startup nomount
rman
connect target SYS/
connect catalog rman10g/
CONNECT AUXILIARY /
run
{
allocate channel t1 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=standbynode1,NB_ORA_CLASS=rdc_oradb_qa,NB_ORA_SERV=nbmstrnode1)';
allocate auxiliary channel t2 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=standbynode1,NB_ORA_CLASS=rdc_oradb_qa,NB_ORA_SERV=nbmstrnode1)';
allocate auxiliary channel t3 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=standbynode1,NB_ORA_CLASS=rdc_oradb_qa,NB_ORA_SERV=nbmstrnode1)';
allocate auxiliary channel t4 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=standbynode1,NB_ORA_CLASS=rdc_oradb_qa,NB_ORA_SERV=nbmstrnode1)';
set until time "to_date('2009-10-09 02:00:00','YYYY-MM-DD hh24:mi:ss')";
DUPLICATE TARGET DATABASE TO OP01DRDB
db_file_name_convert=('/u01/app/oracle/admin/oralink/op01rtdb/','/u01/app/oracle/admin/oralink/op01drdb/')
skip tablespace ARIDAT03,RTKIDX06,RTKIDXEX01,RTKIDXEX03,OUTLN01,RTKIDX02,RTKDAT02,ARIDAT02,RTKIDX08,RTKDATEX03,RTKDAT03,RTKIDX04,RTKIDX01,RTKDAT04,PMIDAT01,TOOLS01,ARIIDX02,
ARIIDX03,AUDDAT01,ARIDAT01,RTKDAT01,RTKDAT06,RTKIDX03,RTKIDXEX02,RTKIDX07,ARIIDX01,RTKIDX05,RTKDAT07,USERS01,RTKDAT05,RTKDATEX02,RTKDAT08
LOGFILE
GROUP 1 ('/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0101a.rdo','/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0101b.rdo') SIZE 511M,
GROUP 2 ('/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0201a.rdo','/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0201b.rdo') SIZE 511M,
GROUP 3 ('/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0301a.rdo','/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0301b.rdo') SIZE 511M REUSE;
}
Note : To get the value of the parameter marked in RED, you need to go to op99rtdb01/standbynode1:/usr/openv/netbackup/logs/user_ops/dbext/logs
Grep for rdc_oradb_qa and CAT any of the file.
standbynode1:[op99rtdb01]:/usr/openv/netbackup/logs/user_ops/dbext/logs>more 709.0.1264176793
Backup started Fri Jan 22 10:13:13 2010
10:14:11 INF - Backup id = standbynode1_1264176842
10:14:11 INF - Backup time = 1264176842
10:14:11 INF - Policy name = rdc_oradb_qa
10:14:11 INF - Snapshot = 0
10:14:12 INF - Frozen image = 0
10:14:12 INF - Backup copy = 0
10:14:12 INF - Master server = nbmstrnode1
Key words : create ; auxiliary ; duplicate ; standby ; dataguard ; full ; backup ;