Tuesday 7 December 2010

Key points to consider for Oracle RAC/Cluster in Linux

1. Ensure Max. Open Files is set to 65k for Oracle Cluster.
Reference
Oracle Bug # 8429716
Oracle Metalink Note id # 858279.1

You may see timeout errors like below in the crsd.log file.
Important : Oracle Metalink note id # 732086.1 & Bug 6196746 may misguide you.

2010-12-07 03:14:54.776: [ CRSEVT][1494178112] CAAMonitorHandler :: 0:Action Script /u01/crs/oracle/product/crs/bin/racgwrap(check) timed out for

ora.dlp34db.vip! (timeout=60)
2010-12-07 03:14:54.776: [ CRSAPP][1494178112] CheckResource error for ora.dlp34db.vip error code = -2
2010-12-07 03:14:54.846: [ CRSEVT][1492076864] CAAMonitorHandler :: 0:Could not join /u01/crs/oracle/product/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child


2. Increase DIAGWAIT. This avoids failure in dumping log information prior to Node Eviction by giving sufficient time.
Eventually it helps to troubleshoot the root cause of Node Eviction
Reference
Oracle Metalink Note id # 559365.1

crsctl set css diagwait 13 -force

3. Ensure HUGEPAGE is set appropriately.
Oracle Metalink Note Reference
Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration [ID 401749.1]
HugePages on 64-bit Linux [ID 361468.1]
HugePages on Linux: What It Is... and What It Is Not... [ID 361323.1]

4. Additional settings to fetch server level core information to diagnose the Server crash due to Oracle Cluster
a. Enable CSSD core by following Oracle Metalink Note 605449.1
b. Configure kdump by following Oracle Metalink Note 452067.1


SQLID level Tracing Feature in 11g

Now since Oracle 11g onwards you can trace all sessions by filtering a particular SQL ID or
set of SQL IDs


For example

alter session/system set events 'sql_trace [sql:g3yc1js3g2689|7ujay4u33g337] wait=true | false, bind=true | false, planstat=never | first_execution | all_executionss|level 12';

alter session/system set events 'sql_trace [sql:g3yc1js3g2689|7ujay4u33g337] off';

Trace will be generated upon running that particular SQL ID

Keywords : SQL ID, sqlid trace

Monday 22 February 2010

Solution for long running query against DBA_EXTENTS to find the corrupted segment

Problem

You are running a query against DBA_EXTENTS to find the segment information (most probably you would be using
this to find a CORRUPTED segment/object) for a particular FILE_ID and BLOCK_ID.

Query might look something like this.

select owner,segment_name,segment_type
from dba_extents
where file_id =
and between block_id and (block_id+(blocks-1));

This query might run longer and sometimes you may lose your patience waiting for it.


Solution

  • Dump the block and take the object id using the below command

alter system dump datafile block ;
Eg.alter system dump datafile 1811 block 20000;

FYI, this command generate a trace file in the user_dump_dest directory

  • Grep the object number from the trace file.

Locate the trace file and grep the object id which will be presented in HEX value.

grep "seg/obj" op01csdb01_ora_13143.trc

seg/obj: 0xe024a2 csc: 0x878.a5583a53 itc: 50 flg: E typ: 1 - DATA

  • Find the object information from DBA_OBJECTS

select owner,object_id,data_object_id,object_name,subobject_name

from dba_objects

where (object_id=to_number('E024A2', 'xxxxxx')

or

data_object_id=to_number('E024A2', 'xxxxxx')

);

Note : the count of 'x' should be equal to the number of characters in the source.

Eg. E024A2 - is 6 character string, hence i used 'xxxxxx'

Keywords : query; dba_extents; long; slow; corrupt ; segment; object

Monday 25 January 2010

How to create duplicate database from a primary when the FULL backup happens out of standby

Scenario

1. You want to create a DUPLICATE database to perform PITR out of Primary PRODUCTION database. Duplicate/auxiliary instance is created in a different host.

2. FULL/DYNAMIC backup of primary production database goes out of STANDBY

Assumption

1. Backup of Production databae controlfile exists

Facts

In this scenario, I used the following command to perform PITR recovery in RETEK database.
1. Production database
a. op01rtdb01/2
b. primnode1/primnode2
2. Standby database
a. Op99rtdb01
b. standbynode1
c. Policy : rdc_oradb_qa
d. Netbackup master server : nbmstrnode1

Note : FULL database backup is performed out of op99rtdb01

3. Duplicate/Auxiliary database
a. op01drdb01
b. auxnode1

Commands

PFA the general commands for this sort PITR. This document would also cover the directory creation and other steps.
Otherwise, the key steps are

1. Login to auxnode1 as “oracle” unix user

2. . oraenv (type op01drdb01)

3. Startup nomount

rman

connect target SYS/@op01rtdb01 Note : I am connecting only to primary production database, though the FULL backup happens out of OP99RTDB01.

connect catalog rman10g/@op01emdb02

CONNECT AUXILIARY /

run
{
allocate channel t1 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=standbynode1,NB_ORA_CLASS=rdc_oradb_qa,NB_ORA_SERV=nbmstrnode1)';
allocate auxiliary channel t2 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=standbynode1,NB_ORA_CLASS=rdc_oradb_qa,NB_ORA_SERV=nbmstrnode1)';
allocate auxiliary channel t3 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=standbynode1,NB_ORA_CLASS=rdc_oradb_qa,NB_ORA_SERV=nbmstrnode1)';
allocate auxiliary channel t4 type 'SBT_TAPE' parms 'ENV=(NB_ORA_CLIENT=standbynode1,NB_ORA_CLASS=rdc_oradb_qa,NB_ORA_SERV=nbmstrnode1)';
set until time "to_date('2009-10-09 02:00:00','YYYY-MM-DD hh24:mi:ss')";
DUPLICATE TARGET DATABASE TO OP01DRDB
db_file_name_convert=('/u01/app/oracle/admin/oralink/op01rtdb/','/u01/app/oracle/admin/oralink/op01drdb/')
skip tablespace ARIDAT03,RTKIDX06,RTKIDXEX01,RTKIDXEX03,OUTLN01,RTKIDX02,RTKDAT02,ARIDAT02,RTKIDX08,RTKDATEX03,RTKDAT03,RTKIDX04,RTKIDX01,RTKDAT04,PMIDAT01,TOOLS01,ARIIDX02,
ARIIDX03,AUDDAT01,ARIDAT01,RTKDAT01,RTKDAT06,RTKIDX03,RTKIDXEX02,RTKIDX07,ARIIDX01,RTKIDX05,RTKDAT07,USERS01,RTKDAT05,RTKDATEX02,RTKDAT08
LOGFILE
GROUP 1 ('/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0101a.rdo','/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0101b.rdo') SIZE 511M,
GROUP 2 ('/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0201a.rdo','/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0201b.rdo') SIZE 511M,
GROUP 3 ('/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0301a.rdo','/u01/app/oracle/admin/oralink/op01drdb/op01drdb01_0301b.rdo') SIZE 511M REUSE;
}

Note : To get the value of the parameter marked in RED, you need to go to op99rtdb01/standbynode1:/usr/openv/netbackup/logs/user_ops/dbext/logs
Grep for rdc_oradb_qa and CAT any of the file.

standbynode1:[op99rtdb01]:/usr/openv/netbackup/logs/user_ops/dbext/logs>more 709.0.1264176793
Backup started Fri Jan 22 10:13:13 2010


10:14:11 INF - Backup id = standbynode1_1264176842
10:14:11 INF - Backup time = 1264176842
10:14:11 INF - Policy name = rdc_oradb_qa
10:14:11 INF - Snapshot = 0
10:14:12 INF - Frozen image = 0
10:14:12 INF - Backup copy = 0
10:14:12 INF - Master server = nbmstrnode1

Key words : create ; auxiliary ; duplicate ; standby ; dataguard ; full ; backup ;