I am demonstrating the Voting disk recovery on this blog. This exercise is tested in oracle12c test cluster environment. It is two node RAC and node names are RACTEST1 and RACTEST2.
My voting disk has external redundancy. Hence it is only one copy of voting disk present on the ASM disk.
My goal is to corrupt the voting disk and restore in new disk. Here are the high level steps.
Step 1
Let us check where the Voting disk is present on the cluster.
crsctl query css votedisk
The voting disk is stored on VOTE1 disk on the VOTE1 disk group.
Step 2
Let me corrupt the voting disk. Here is the command to corrupt the disk.
dd if=/dev/zero of=/dev/oracleasm/disks/VITE1 bs=4096 count=100
At this stage, clusterware should have stopped working.
I rebooted the node and checked the cluster and the service was down. It is not necessary to reboot the node. It is test environment and just rebooted to check the cluster service. In production environment, it is enough to bounce the CRS service.
Step 3
crsctl check crs
crsctl check cluster
Step 4 Shutdown the database
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is running on node ractest1
Instance govinddb2 is running on node ractest2
[oracle@RACTEST1 ~]$ srvctl stop database -db govinddb
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is not running on node ractest1
Instance govinddb2 is not running on node ractest2
[oracle@RACTEST1 ~]$
Step 5 Stop the CRS on all nodes
[root@RACTEST1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ractest1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.evmd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'ractest1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ractest1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'ractest1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.cssdmonitor' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'ractest1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ractest1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@RACTEST1 bin]#
[root@RACTEST2 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ractest2'
CRS-2673: Attempting to stop 'ora.gipcd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ractest2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.evmd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'ractest2'
CRS-2677: Stop of 'ora.cssdmonitor' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ractest2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ractest2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@RACTEST2 bin]#
Step 11 Start the database and monitor the alert log
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is not running on node ractest1
Instance govinddb2 is not running on node ractest2
[oracle@RACTEST1 ~]$ srvctl start database -db govinddb
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is running on node ractest1
Instance govinddb2 is running on node ractest2
[oracle@RACTEST1 ~]$
Additional note......
We do not need to bring the CRS service down when there is at least one working copy is intact.
CRS should be up on all nodes for the following operations(per Doc ID 428681.1)
1. Adding additional voting disk on the disk group
2. Moving the voting disk
3. Deleting one of the voting disk on the diskgroup
4. Adding another copy of OCR file on different disk
5. Moving the OCR file to different disk
6. Removing one copy of OCR file
CRS should be up in ONLY one node with exclusive mode as per Doc ID 1062983.1 for the following operations.
1. We have ONLY one copy of OCR file and it is corrupted.
2. We have ONLY one copy of voting disk and it is corrupted
My voting disk has external redundancy. Hence it is only one copy of voting disk present on the ASM disk.
My goal is to corrupt the voting disk and restore in new disk. Here are the high level steps.
- corrupt the voting disk
- shutdown the database
- stop the CRS service on all nodes
- start the CRS service on exclusive mode for ONE node
- create the new disk or Use existing disk
- Restore the voting disk to newly created node
- stop the CRS which is started on exclusive mode
- start the CRS on both node
- start the database and make sure all the instance are up
- verify the cluster service and make sure all good!
Step 1
Let us check where the Voting disk is present on the cluster.
crsctl query css votedisk
The voting disk is stored on VOTE1 disk on the VOTE1 disk group.
Step 2
Let me corrupt the voting disk. Here is the command to corrupt the disk.
dd if=/dev/zero of=/dev/oracleasm/disks/VITE1 bs=4096 count=100
At this stage, clusterware should have stopped working.
I rebooted the node and checked the cluster and the service was down. It is not necessary to reboot the node. It is test environment and just rebooted to check the cluster service. In production environment, it is enough to bounce the CRS service.
Step 3
crsctl check crs
crsctl check cluster
Step 4 Shutdown the database
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is running on node ractest1
Instance govinddb2 is running on node ractest2
[oracle@RACTEST1 ~]$ srvctl stop database -db govinddb
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is not running on node ractest1
Instance govinddb2 is not running on node ractest2
[oracle@RACTEST1 ~]$
[root@RACTEST1 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ractest1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.evmd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'ractest1'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ractest1'
CRS-2673: Attempting to stop 'ora.gipcd' on 'ractest1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.cssdmonitor' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'ractest1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ractest1' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@RACTEST1 bin]#
[root@RACTEST2 bin]# ./crsctl stop crs -f
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ractest2'
CRS-2673: Attempting to stop 'ora.gipcd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.cssdmonitor' on 'ractest2'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.evmd' on 'ractest2'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'ractest2'
CRS-2677: Stop of 'ora.cssdmonitor' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.gipcd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'ractest2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ractest2' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ractest2' has completed
CRS-4133: Oracle High Availability Services has been stopped.
[root@RACTEST2 bin]#
Step 6 Start the CRS on one node with exclusive mode
[root@RACTEST1 bin]# ./crsctl start crs -excl
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.evmd' on 'ractest1'
CRS-2672: Attempting to start 'ora.mdnsd' on 'ractest1'
CRS-2676: Start of 'ora.mdnsd' on 'ractest1' succeeded
CRS-2676: Start of 'ora.evmd' on 'ractest1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'ractest1'
CRS-2676: Start of 'ora.gpnpd' on 'ractest1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'ractest1'
CRS-2672: Attempting to start 'ora.gipcd' on 'ractest1'
CRS-2676: Start of 'ora.cssdmonitor' on 'ractest1' succeeded
CRS-2676: Start of 'ora.gipcd' on 'ractest1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'ractest1'
CRS-2672: Attempting to start 'ora.diskmon' on 'ractest1'
CRS-2676: Start of 'ora.diskmon' on 'ractest1' succeeded
CRS-2676: Start of 'ora.cssd' on 'ractest1' succeeded
CRS-2672: Attempting to start 'ora.crf' on 'ractest1'
CRS-2672: Attempting to start 'ora.ctssd' on 'ractest1'
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'ractest1'
CRS-2676: Start of 'ora.crf' on 'ractest1' succeeded
CRS-2676: Start of 'ora.ctssd' on 'ractest1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'ractest1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'ractest1'
CRS-2676: Start of 'ora.asm' on 'ractest1' succeeded
CRS-2672: Attempting to start 'ora.storage' on 'ractest1'
CRS-2676: Start of 'ora.storage' on 'ractest1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'ractest1'
CRS-2676: Start of 'ora.crsd' on 'ractest1' succeeded
[root@RACTEST1 bin]#
Step 7 Verify the ASM instance and make sure it is up and running. Start the ASM instance if it is down. My case, ASM instance is started as part of starting the CRS service. We need ASM instance to restore the voting disk.
Login to ASM instance.
echo INSTANCE_TYPE=ASM >> /u01/app/oracle/init+ASM1.ora
startup pfile='/u01/app/oracle/init+ASM1.ora';
Step 7 Create the new disk VOTE2 for restoring the voting disk.
SQL>CREATE DISKGROUP VOTE2 EXTERNAL REDUNDANCY
DISK 'ORCL:VOTE2'
ATTRIBUTE 'au_size'='4M',
'compatible.asm' = '11.2.0.2.0',
'compatible.rdbms' = '11.2.0.2.0',
'compatible.advm' = '11.2.0.2.0';
SQL>
Diskgroup created.
SQL>
DISK 'ORCL:VOTE2'
ATTRIBUTE 'au_size'='4M',
'compatible.asm' = '11.2.0.2.0',
'compatible.rdbms' = '11.2.0.2.0',
'compatible.advm' = '11.2.0.2.0';
SQL>
Diskgroup created.
SQL>
Step 8 Restore the voting disk on newly created disk VOTE2.
Let us check the current voting disk location. Now no voting disk is displaying here. Since the voting disk is already corrupted.
[oracle@RACTEST1 bin]$ ./crsctl query css votedisk
Located 0 voting disk(s).
Now recover the voting disk as below. Voting disk is automatically recovererd using the lastest available copy of OCR.
[oracle@RACTEST1 bin]$ ./crsctl replace votedisk +VOTE2
Successful addition of voting disk 5d17422445e54f1abf131f15b967c07f.
Successfully replaced voting disk group with +VOTE2.
CRS-4266: Voting file(s) successfully replaced
Let us check the voting disk again.
[oracle@RACTEST1 bin]$ ./crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 5d17422445e54f1abf131f15b967c07f (ORCL:VOTE2) [VOTE2]
Located 1 voting disk(s).
[oracle@RACTEST1 bin]$
Step 9 Stop the CRS on RACTEST1 which was started in exclusive mode.
[root@RACTEST1 bin]# ./crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'ractest1'
CRS-2673: Attempting to stop 'ora.crsd' on 'ractest1'
CRS-2677: Stop of 'ora.crsd' on 'ractest1' succeeded
CRS-2673: Attempting to stop 'ora.storage' on 'ractest1'
CRS-2673: Attempting to stop 'ora.mdnsd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.gpnpd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'ractest1'
CRS-2677: Stop of 'ora.storage' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.drivers.acfs' on 'ractest1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'ractest1'
CRS-2673: Attempting to stop 'ora.ctssd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.evmd' on 'ractest1'
CRS-2673: Attempting to stop 'ora.asm' on 'ractest1'
CRS-2677: Stop of 'ora.gpnpd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.mdnsd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.crf' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.evmd' on 'ractest1' succeeded
CRS-2677: Stop of 'ora.asm' on 'ractest1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'ractest1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'ractest1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'ractest1'
CRS-2677: Stop of 'ora.cssd' on 'ractest1' succeeded
CRS-2673: Attempting to stop 'ora.gipcd' on 'ractest1'
CRS-2677: Stop of 'ora.gipcd' on 'ractest1' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'ractest1' has complet ed
CRS-4133: Oracle High Availability Services has been stopped.
Step 10 Start the CRS on RACTEST1 and RACTEST1
[root@RACTEST1 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@RACTEST1 bin]#
[root@RACTEST2 bin]# ./crsctl start crs
CRS-4123: Oracle High Availability Services has been started.
[root@RACTEST2 bin]#
Step 11 Make sure Cluster is up and running. Restart the cluster if it is down. But my case, cluster is up and running.
[root@RACTEST1 bin]# ./crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@RACTEST1 bin]# ./crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[root@RACTEST1 bin]#
Step 11 Start the database and monitor the alert log
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is not running on node ractest1
Instance govinddb2 is not running on node ractest2
[oracle@RACTEST1 ~]$ srvctl start database -db govinddb
[oracle@RACTEST1 ~]$ srvctl status database -db govinddb
Instance govinddb1 is running on node ractest1
Instance govinddb2 is running on node ractest2
[oracle@RACTEST1 ~]$
Additional note......
We do not need to bring the CRS service down when there is at least one working copy is intact.
CRS should be up on all nodes for the following operations(per Doc ID 428681.1)
1. Adding additional voting disk on the disk group
2. Moving the voting disk
3. Deleting one of the voting disk on the diskgroup
4. Adding another copy of OCR file on different disk
5. Moving the OCR file to different disk
6. Removing one copy of OCR file
CRS should be up in ONLY one node with exclusive mode as per Doc ID 1062983.1 for the following operations.
1. We have ONLY one copy of OCR file and it is corrupted.
2. We have ONLY one copy of voting disk and it is corrupted
No comments:
Post a Comment