Archive for ‘Grid Infrastructure’

December 15, 2011

Creation database 11.2.0.2 on ASM using dbca causes crash of existing instances

During creation of new instance with ASM on Oracle 11.2.0.2, some existing instances of the same ORACLE_HOME suddently crashes! We’ve got such error message in alert.log.

*** 2011-11-22 17:36:00.962
Unexpected error 27140 in job slave process
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: Operation not permitted
ORA-27302: failure occurred at: skgpwinit6
ORA-27303: additional information: startup egid = 1000 (oinstall), current egid = 1200 (dba)

On the same machine we have installed grid infrastructure of 11.2.0.2. The clusterware (owned by grid) manages  all instances of 11.2..0.2 server as cluster resources in active/passive mode. It’s worth of pointing out that this problem never happens when we use local file system or clustered file system as storage in place of ASM.

Analysis

we realized that the binary file ‘oracle’ in directory $ORACLE_HOME/bin has been changed with ownership. It was changed from “oracle:oinstall” as current instances had started to ‘oracle:dba’ during creation of new instance, which made current instances crashed.

Cause

This is identified as a Bug 9786198 [http://bug.oraclecorp.com/pls/bug/webbug_edit.edit_info_top?rptno=9786198] – SRVCTL START DATABASE ORA-0178 FAILURE IN PROCESSING SYSTEM PARAMETERS

This happens when you chose “dba” for “ASM Database Administrator”,”ASM Instance Administration Operator” and “ASM Instance Administrator”
group when installing grid infrastracture.

If all other database instances are starting up on all nodes, there is no need to do any changes to setasmgid.
This change needs only to be done if you hit same issue again.

Solution

check setasmgidwrap script under grid home.
It should contain entry as below.
.
SETASMGID_LOC=<directory>
.
Check if setasmgid exists under SETASMGID_LOC. rename $SETASMGID_LOC/setasmgid to $SETASMGID_LOC/setasmgid.orig.

restore the permission of oracle binary to oinstall.

Tags: ,
July 1, 2011

Debugging CRS

Enabling Additional Debugging Information for Cluster Ready Services

The crsd daemon can produce additional debugging information if you set the variable CRS_DEBUG to the value 1by performing the following procedures:

In the file /etc/init.d/init.crsd add the entry:

CRS_DEBUG=1
export CRS_DEBUG

Then kill the crsd daemon with the command:

$ kill -9 crsd process id

Allow the init process to restart the crsd. Oracle will write additional information to the standard log files
June 24, 2011

oracleasm deletedisk [FAILED]

Trying to delete a oracleasm disk but failed.

oracleasm deletedisks DISK1 [FAILED]

workaround

oracleasm version 2.1.3
ASM Oracle Database 11g Enterprise Edition Release 11.2.0.2.0

$ fuser /dev/mapper/asm01
$ dd if=/dev/zero of=/dev/mapper/asm01 bs=1024 count=100
100+0 records in
100+0 records out
102400 bytes (102 kB) copied, 0.000324 seconds, 300 MB/s

$ /etc/init.d/oracleasm deletedisk DISK1
Removing ASM disk "DISK1": [ OK ]

And then on all nodes.
$ oracleasm scandisks
DISK1
DISK2

 

 

June 24, 2011

Installation Grid Infrastructure 11.2.0.2 on Redhat 5: bug PRVF-5150 e PRVF-5449

If you intall Grid Infrastructure 11.2.0.2 on RHEL 5.x and face this error during environment checking:

PRVF-5150: Path ORCL: is not a valid path
It’s related to a bug 10026970 is fixed in 11.2.0.3. Refer to note:

Device Checks for ASM Fails with PRVF-5150: Path ORCL: is not a valid path [ID 1210863.1]

Follow the instructions to make a manual checking, if everythingis OK, you can safely ignore the error and go ahead with rest of installation.

Just to remember.

 

May 24, 2011

Oracle Active-Passive Cluster 11gr2

This note explains how to use Oracle Grid Infrastructure to protect a standalone database 11.2.

Environment

  • Active-Passive Cluster with 2 nodes running on Centos 5.5 64bit
  • 1 node running Openfiler served as iSCSI target server and DNS server
  • 3 ASM disk groups created on iSCSI targets as storage of CRS, DATA and FRA

Steps:

  • Install Centos 5.5 with required packages on both nodes
  • Set kernal parameters for Oracle Clusterware 11.2 and Database 11.2 on both nodes
  • Create role seperation management: grid for Grid Infrastructure and oracle for database on both nodes
  • Set up DNS on both nodes
  • Set up ASM diskgroups CRS, DATA and FRA on iSCSI targets on both nodes
  • Install Oracle Clusterware 11.2 on the Cluster owned by user ‘grid’, choosing DNS for SCAN and using CRS diskgroup as storage of OCR and Voting disk.
  • Install Oracle Database 11.2 software (standalone!) on both nodes
  • As user ‘grid’, create and mount diskgroup DATA and FRA by ‘asmca’
  • As user ‘oracle’, call ‘dbca’ to create database on primary node only.
  • Remove the database from OCR and add a cluster resource using a customized action script
  • Copy the action script to the other node.
  • Test start, stop, check, relocate and restart the database by clusterware.

More details will be added to each of the steps.

May 23, 2011

CRS-2518: Invalid directory path when calling “crsctl add res”

I got this error when adding a resource using


[oracle@guang HA_scripts]$ crsctl add resource DB11GR2.db -type cluster_resource -file /u01/app/11.2.0/grid/HA_scripts/ora.db11gr2.db.profile.cluster_vertion.owner_oracle

CRS-2518: Invalid directory path '/u01/app/11.2.0/grid/HA_scripts/active_passive_cluster_run_by_oracle.sh'
CRS-4000: Command Add failed, or completed with errors.

Where the profile /u01/app/11.2.0/grid/HA_scripts/ora.db11gr2.db.profile.cluster_vertion.owner_oracle has the following contents:


NAME=DB11GR2.db
TYPE=cluster_resource
DESCRIPTION=Oracle Database resource
ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--
ACTION_SCRIPT=/u01/app/11.2.0/grid/HA_scripts/oracle_restart_db11gr2.sh
PLACEMENT=restricted
ACTIVE_PLACEMENT=0
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=10
DEGREE=1
ENABLED=1
HOSTING_MEMBERS=guang yangmei
LOGGING_LEVEL=1
RESTART_ATTEMPTS=1
START_TIMEOUT=600
START_DEPENDENCIES=hard(ora.DATA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DATA.dg,ora.FRA.dg)
STOP_TIMEOUT=600
STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DATA.dg,shutdown:ora.FRA.dg)
UPTIME_THRESHOLD=1h

At the beginning I thought it’s an issue related to user’s permission. I run the command as ‘oracle’ user (the owner of RDBMS) while the resource profile is located in ‘GRID_HOME’.

But it finally turned out that the script ‘/u01/app/11.2.0/grid/HA_scripts/active_passive_cluster_run_by_oracle.sh’ is NOT COPIED to the other node. The error message is somewhat miss leading. After scping the missing files to the other node ‘yangmei’, it successfully completed.


[grid@guang HA_scripts]$ scp active_passive_cluster_run_by_oracle.sh ora.db11gr2.db.profile.cluster_vertion.owner_oracle grid@yangmei:/u01/app/11.2.0/grid/HA_scripts

[grid@guang HA_scripts]$ scp oracle_restart_db11gr2.sh  grid@yangmei:/u01/app/11.2.0/grid/HA_scripts

oracle_restart_db11gr2.sh                                                                                         100%  717     0.7KB/s   00:00

ora.db11gr2.db.profile.cluster_vertion.owner_oracle                                    100%  654     0.6KB/s   00:00

[oracle@guang HA_scripts]$ crsctl add resource DB11GR2.db -type cluster_resource -file /u01/app/11.2.0/grid/HA_scripts/ora.db11gr2.db.profile.cluster_vertion.owner_oracle

[oracle@guang HA_scripts]$ crsctl start res DB11GR2.db
CRS-2672: Attempting to start 'DB11GR2.db' on 'guang'
CRS-2676: Start of 'DB11GR2.db' on 'guang' succeeded

succeed.

Take a note to remember the fix.