The Sys Admin Ate My Database:


January 10, 2019

I once had a situation where a Unix admin took 4 LUNs on a linux system and added them to a filesystem, as it was running out of space.

Unfortunately, those LUNs were actively used by ASM – and 4 databases were destroyed due to the data being wiped out. That was a VERY long weekend for me as you can imagine. I’d rather not repeat that.

 

There are now 3 ways to manage your ASM disks, ASMLIB, UDEV, and now the ASM Filter Driver. ASMLIB will most likely be deprecated at some point, and while UDEV is solid – Oracle has built in some great features in the ASM Filter Driver, so I want to focus on using that to manage the storage devices. 

 

One of the key benefits of the ASM Filter Driver is that you cannot accidentally overwrite the data. Oracle installs a kernel module that only allows writes from oracle processes. Other writes are rejected by the kernel. This would have prevented that disaster weekend where we had to restore several large databases.

 

Here’s a demonstration on setting up the ASM Filter Driver, and how a non oracle process write is rejected by the kernel.

 

Once you unzip the 12.2 installation, we need to set the storage devices to be managed by the ASM Filter Driver. For this demo, I will be using /dev/sdb for the ASM disk, on Oracle Enterprise Linux 7.5 with unbreakable kernel 4.1.12-124.20.7.

 

[root@localhost bin]# ./asmcmd afd_label DATA1 /dev/sdb –init

 

[root@localhost bin]# ./asmcmd afd_lslbl /dev/sdb

——————————————————————————–

Label                     Duplicate  Path

================================================================================

DATA1                                 /dev/sdb

 

From this point forward, setup the GRID installation as you would normally, and make sure the “Use ASM Filter Driver” box is checked. It should be by default.

 

Once you’re running, you can verify that filtering is enabled:

 

[oracle@localhost disks]$ $ORACLE_HOME/bin/asmcmd afd_state

ASMCMD-9526: The AFD state is ‘LOADED’ and filtering is ‘ENABLED’ on host ‘localhost.localdomain’

 

SQL> SELECT SYS_CONTEXT(‘SYS_ASMFD_PROPERTIES’, ‘AFD_STATE’) FROM DUAL;

 

SYS_CONTEXT(‘SYS_ASMFD_PROPERTIES’,’AFD_STATE’)

——————————————————————————–

CONFIGURED

 

[oracle@localhost disks]$ cd /dev/oracleafd/disks

[oracle@localhost disks]$ ls -al

 

total 4

drwxrwxr-x 2 oracle oracle 60 Nov  3 02:58 .

drwxrwxr-x 3 oracle oracle 80 Nov  3 02:58 ..

-rwxrwx— 1 oracle oracle  9 Nov  3 02:58 DATA1

 

[oracle@localhost disks]$ cat DATA1

/dev/sdb

 

 

Now what I’m going to do is overwrite & destroy the contents of the /dev/sdb disk, which is used by ASM.

 

[oracle@localhost disks]$ sudo su – root

 

[root@localhost bin]# dd if=./afdboot of=/dev/sdb

159418+1 records in

159418+1 records out

81622419 bytes (82 MB) copied, 4.00743 s, 20.4 MB/s

 

Well, it looks the writes went through. Or DID they ? Let’s check the kernel messages:

 

[root@localhost log]# dmesg

[ 1375.685657] F 4296038.287/181103071201 systemd-udevd[547] oracleafd:18:0894:Write IO to ASM managed device: [8] [16]

[ 1375.685661] Buffer I/O error on dev sdb, logical block 0, lost async page write

[ 1864.988174] Buffer I/O error on dev sdb, logical block 0, lost async page write

[ 1864.988180] Buffer I/O error on dev sdb, logical block 1, lost async page write

[ 1864.988183] Buffer I/O error on dev sdb, logical block 2, lost async page write

[ 1864.988186] Buffer I/O error on dev sdb, logical block 3, lost async page write

[ 1864.988189] Buffer I/O error on dev sdb, logical block 4, lost async page write

[ 1864.988191] Buffer I/O error on dev sdb, logical block 5, lost async page write

[ 1864.988194] Buffer I/O error on dev sdb, logical block 6, lost async page write

[ 1864.988196] Buffer I/O error on dev sdb, logical block 7, lost async page write

[ 1864.988199] Buffer I/O error on dev sdb, logical block 8, lost async page write

[ 1864.988201] Buffer I/O error on dev sdb, logical block 9, lost async page write

 

We see here that the kernel rejected the IO’s to the /dev/sdb disk, as it did not come from an Oracle process. Let’s verify the disk is unharmed by restarting ASM and seeing if it can mount DATA1:

 

SQL> shutdown immediate;

ASM diskgroups volume disabled

ASM diskgroups dismounted

ASM instance shutdown

 

SQL> startup;

ASM instance started

 

Total System Global Area 1140850688 bytes

Fixed Size            8629704 bytes

Variable Size           1107055160 bytes

ASM Cache            25165824 bytes

ASM diskgroups mounted

ASM diskgroups volume enabled

SQL>

 

It came up clean, let’s also check the files inside the diskgroup using ASMCMD:

 

[oracle@localhost ~]$ asmcmd

ASMCMD> ls

DATA/

 

ASMCMD> cd DATA

ASMCMD> ls

ASM/

orapwasm

 

ASMCMD> cd ASM

ASMCMD> ls

ASMPARAMETERFILE/

PASSWORD/

 

ASMCMD> cd ASMPARAMETERFILE

ASMCMD> ls

REGISTRY.253.991191939

 

 

All looks good ! the writes were completely rejected and ASM is still intact. NOW, let’s disable the filtering and

repeat this exercise to see what happens:

 

ASMCMD> afd_filter -d –all

ASMCMD> afd_state

ASMCMD-9526: The AFD state is ‘LOADED’ and filtering is ‘DISABLED’ on host ‘localhost.localdomain’

 

ASMCMD> afd_lsdsk

——————————————————————————–

Label                     Filtering   Path

================================================================================

DATA1                      DISABLED   /dev/sdb

 

[root@localhost bin]# dd if=./afdboot of=/dev/sdb

159418+1 records in

159418+1 records out

81622419 bytes (82 MB) copied, 3.78481 s, 21.6 MB/s

 

“dmesg” shows no rejects on the writes from the above “dd” command. Let’s see how ASM responds to a restart:

 

SQL> shutdown immediate;

ASM diskgroups volume disabled

ASM diskgroups dismounted

ASM instance shutdown

 

SQL> startup;

ORA-00099: warning: no parameter file specified for ASM instance

ASM instance started

 

Total System Global Area 1140850688 bytes

Fixed Size            8629704 bytes

Variable Size           1107055160 bytes

ASM Cache            25165824 bytes

ORA-15110: no diskgroups mounted

 

SQL>

 

The disk is destroyed, and ASM cannot start as there is no parameter it can use anymore. Let’s look at the contents using ASMCMD:

 

[oracle@localhost ~]$ asmcmd

ASMCMD> ls

ASMCMD>

 

Nothing shows. The diskgroup cannot be mounted as we destroyed it using “dd”, and thus there is no info to show In ASMCMD.

 

I hope this gives a good overview of the ASM Filter Driver and the benefits it brings. Post a comment below or get in contact with us if you would like to learn more.