1. Ensure Max. Open Files is set to 65k for Oracle Cluster.
Reference
Oracle Bug # 8429716
Oracle Metalink Note id # 858279.1
You may see timeout errors like below in the crsd.log file.
Important : Oracle Metalink note id # 732086.1 & Bug 6196746 may misguide you.
2010-12-07 03:14:54.776: [ CRSEVT][1494178112] CAAMonitorHandler :: 0:Action Script /u01/crs/oracle/product/crs/bin/racgwrap(check) timed out for
ora.dlp34db.vip! (timeout=60)
2010-12-07 03:14:54.776: [ CRSAPP][1494178112] CheckResource error for ora.dlp34db.vip error code = -2
2010-12-07 03:14:54.846: [ CRSEVT][1492076864] CAAMonitorHandler :: 0:Could not join /u01/crs/oracle/product/crs/bin/racgwrap(check)
category: 1234, operation: scls_process_join, loc: childcrash, OS error: 0, other: Abnormal termination of the child
2. Increase DIAGWAIT. This avoids failure in dumping log information prior to Node Eviction by giving sufficient time.
Eventually it helps to troubleshoot the root cause of Node Eviction
Reference
Oracle Metalink Note id # 559365.1
crsctl set css diagwait 13 -force
3. Ensure HUGEPAGE is set appropriately.
Oracle Metalink Note Reference
Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration [ID 401749.1]
HugePages on 64-bit Linux [ID 361468.1]
HugePages on Linux: What It Is... and What It Is Not... [ID 361323.1]
4. Additional settings to fetch server level core information to diagnose the Server crash due to Oracle Cluster
a. Enable CSSD core by following Oracle Metalink Note 605449.1
b. Configure kdump by following Oracle Metalink Note 452067.1
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment