My Oracle: How to Recover the Voting disk in RAC Environment

I am demonstrating the Voting disk recovery on this blog. This exercise is tested in oracle12c test cluster environment. It is two node RAC and node names are RACTEST1 and RACTEST2.

My voting disk has external redundancy. Hence it is only one copy of voting disk present on the ASM disk.

My goal is to corrupt the voting disk and restore in new disk. Here are the high level steps.

corrupt the voting disk
shutdown the database
stop the CRS service on all nodes
start the CRS service on exclusive mode for ONE node
create the new disk or Use existing disk
Restore the voting disk to newly created node
stop the CRS which is started on exclusive mode
start the CRS on both node
start the database and make sure all the instance are up
verify the cluster service and make sure all good!

Step 1

Let us check where the Voting disk is present on the cluster.

crsctl query css votedisk

The voting disk is stored on VOTE1 disk on the VOTE1 disk group.

Step 2

Let me corrupt the voting disk. Here is the command to corrupt the disk.

dd if=/dev/zero of=/dev/oracleasm/disks/VITE1 bs=4096 count=100

At this stage, clusterware should have stopped working.

I rebooted the node and checked the cluster and the service was down. It is not necessary to reboot the node. It is test environment and just rebooted to check the cluster service. In production environment, it is enough to bounce the CRS service.

Step 3

crsctl check crs
crsctl check cluster

Step 4 Shutdown the database

[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is running on node ractest1
Instance govinddb2 is running on node ractest2
[oracle@RACTEST1 ~]$ srvctl stop database -db govinddb
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is not running on node ractest1
Instance govinddb2 is not running on node ractest2

[oracle@RACTEST1 ~]$

Step 5 Stop the CRS on all nodes

[root@RACTEST1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ractest1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.evmd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'ractest1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ractest1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'ractest1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.cssdmonitor' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'ractest1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ractest1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@RACTEST1 bin]#

[root@RACTEST2 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ractest2'
CRS-2673: Attempting to stop 'ora.gipcd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ractest2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.evmd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'ractest2'
CRS-2677: Stop of 'ora.cssdmonitor' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ractest2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ractest2' has completed
CRS-4133: Oracle High Availability Services has been stopped.

[root@RACTEST2 bin]#

Step 6 Start the CRS on one node with exclusive mode

[root@RACTEST1 bin]# ./crsctl start crs -excl

CRS-4123: Oracle High Availability Services has been started.

CRS-2672: Attempting to start 'ora.evmd' on 'ractest1'

CRS-2672: Attempting to start 'ora.mdnsd' on 'ractest1'

CRS-2676: Start of 'ora.mdnsd' on 'ractest1' succeeded

CRS-2676: Start of 'ora.evmd' on 'ractest1' succeeded

CRS-2672: Attempting to start 'ora.gpnpd' on 'ractest1'

CRS-2676: Start of 'ora.gpnpd' on 'ractest1' succeeded

CRS-2672: Attempting to start 'ora.cssdmonitor' on 'ractest1'

CRS-2672: Attempting to start 'ora.gipcd' on 'ractest1'

CRS-2676: Start of 'ora.cssdmonitor' on 'ractest1' succeeded

CRS-2676: Start of 'ora.gipcd' on 'ractest1' succeeded

CRS-2672: Attempting to start 'ora.cssd' on 'ractest1'

CRS-2672: Attempting to start 'ora.diskmon' on 'ractest1'

CRS-2676: Start of 'ora.diskmon' on 'ractest1' succeeded

CRS-2676: Start of 'ora.cssd' on 'ractest1' succeeded

CRS-2672: Attempting to start 'ora.crf' on 'ractest1'

CRS-2672: Attempting to start 'ora.ctssd' on 'ractest1'

CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'ractest1'

CRS-2676: Start of 'ora.crf' on 'ractest1' succeeded

CRS-2676: Start of 'ora.ctssd' on 'ractest1' succeeded

CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'ractest1' succeeded

CRS-2672: Attempting to start 'ora.asm' on 'ractest1'

CRS-2676: Start of 'ora.asm' on 'ractest1' succeeded

CRS-2672: Attempting to start 'ora.storage' on 'ractest1'

CRS-2676: Start of 'ora.storage' on 'ractest1' succeeded

CRS-2672: Attempting to start 'ora.crsd' on 'ractest1'

CRS-2676: Start of 'ora.crsd' on 'ractest1' succeeded

[root@RACTEST1 bin]#

Step 7 Verify the ASM instance and make sure it is up and running. Start the ASM instance if it is down. My case, ASM instance is started as part of starting the CRS service. We need ASM instance to restore the voting disk.

echo INSTANCE_TYPE=ASM >> /u01/app/oracle/init+ASM1.ora

startup pfile='/u01/app/oracle/init+ASM1.ora';

Step 7 Create the new disk VOTE2 for restoring the voting disk.

SQL>CREATE DISKGROUP VOTE2 EXTERNAL REDUNDANCY
DISK 'ORCL:VOTE2'
ATTRIBUTE 'au_size'='4M',
'compatible.asm' = '11.2.0.2.0',
'compatible.rdbms' = '11.2.0.2.0',
'compatible.advm' = '11.2.0.2.0';
SQL>

Diskgroup created.

SQL>

Step 8 Restore the voting disk on newly created disk VOTE2.

Let us check the current voting disk location. Now no voting disk is displaying here. Since the voting disk is already corrupted.

[oracle@RACTEST1 bin]$ ./crsctl query css votedisk

Located 0 voting disk(s).

Now recover the voting disk as below. Voting disk is automatically recovererd using the lastest available copy of OCR.

[oracle@RACTEST1 bin]$ ./crsctl replace votedisk +VOTE2

Successful addition of voting disk 5d17422445e54f1abf131f15b967c07f.

Successfully replaced voting disk group with +VOTE2.

CRS-4266: Voting file(s) successfully replaced

Let us check the voting disk again.

[oracle@RACTEST1 bin]$ ./crsctl query css votedisk

## STATE File Universal Id File Name Disk group

-- ----- ----------------- --------- ---------

1. ONLINE 5d17422445e54f1abf131f15b967c07f (ORCL:VOTE2) [VOTE2]

Located 1 voting disk(s).

[oracle@RACTEST1 bin]$

Step 9 Stop the CRS on RACTEST1 which was started in exclusive mode.

[root@RACTEST1 bin]# ./crsctl stop crs

CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ractest1'

CRS-2673: Attempting to stop 'ora.crsd' on 'ractest1'

CRS-2677: Stop of 'ora.crsd' on 'ractest1' succeeded

CRS-2673: Attempting to stop 'ora.storage' on 'ractest1'

CRS-2673: Attempting to stop 'ora.mdnsd' on 'ractest1'

CRS-2673: Attempting to stop 'ora.gpnpd' on 'ractest1'

CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'ractest1'

CRS-2677: Stop of 'ora.storage' on 'ractest1' succeeded

CRS-2677: Stop of 'ora.drivers.acfs' on 'ractest1' succeeded

CRS-2673: Attempting to stop 'ora.crf' on 'ractest1'

CRS-2673: Attempting to stop 'ora.ctssd' on 'ractest1'

CRS-2673: Attempting to stop 'ora.evmd' on 'ractest1'

CRS-2673: Attempting to stop 'ora.asm' on 'ractest1'

CRS-2677: Stop of 'ora.gpnpd' on 'ractest1' succeeded

CRS-2677: Stop of 'ora.mdnsd' on 'ractest1' succeeded

CRS-2677: Stop of 'ora.crf' on 'ractest1' succeeded

CRS-2677: Stop of 'ora.ctssd' on 'ractest1' succeeded

CRS-2677: Stop of 'ora.evmd' on 'ractest1' succeeded

CRS-2677: Stop of 'ora.asm' on 'ractest1' succeeded

CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'ractest1'

CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'ractest1' succeeded

CRS-2673: Attempting to stop 'ora.cssd' on 'ractest1'

CRS-2677: Stop of 'ora.cssd' on 'ractest1' succeeded

CRS-2673: Attempting to stop 'ora.gipcd' on 'ractest1'

CRS-2677: Stop of 'ora.gipcd' on 'ractest1' succeeded

CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ractest1' has complet ed

CRS-4133: Oracle High Availability Services has been stopped.

Step 10 Start the CRS on RACTEST1 and RACTEST1

[root@RACTEST1 bin]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@RACTEST1 bin]#

[root@RACTEST2 bin]# ./crsctl start crs

CRS-4123: Oracle High Availability Services has been started.

[root@RACTEST2 bin]#

Step 11 Make sure Cluster is up and running. Restart the cluster if it is down. But my case, cluster is up and running.

[root@RACTEST1 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@RACTEST1 bin]# ./crsctl check cluster

CRS-4537: Cluster Ready Services is online

CRS-4529: Cluster Synchronization Services is online

CRS-4533: Event Manager is online

[root@RACTEST1 bin]#

Step 11 Start the database and monitor the alert log

[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is not running on node ractest1
Instance govinddb2 is not running on node ractest2
[oracle@RACTEST1 ~]$ srvctl start database -db govinddb
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is running on node ractest1
Instance govinddb2 is running on node ractest2

[oracle@RACTEST1 ~]$

Additional note......

We do not need to bring the CRS service down when there is at least one working copy is intact.

CRS should be up on all nodes for the following operations(per Doc ID 428681.1)

1. Adding additional voting disk on the disk group
2. Moving the voting disk
3. Deleting one of the voting disk on the diskgroup
4. Adding another copy of OCR file on different disk
5. Moving the OCR file to different disk
6. Removing one copy of OCR file

CRS should be up in ONLY one node with exclusive mode as per Doc ID 1062983.1 for the following operations.

1. We have ONLY one copy of OCR file and it is corrupted.
2. We have ONLY one copy of voting disk and it is corrupted

My Oracle

Sunday, November 1, 2015

How to Recover the Voting disk in RAC Environment

No comments:

Oracle10g RAC Administrator

Oracle12c Certified DBA

Welcome to my Blog

Blog Archive

FEEDJIT Live Traffic Feed

Online Users