Even when you think that you know something well enough you discover that there are some corner cases you never encountered. This is the main reason I like IT system administration tasks, you never get bored.
This post is about one of this corner cases in a DRBD setup described in post #DRBD based disk replication of a production cluster to a remote site cluster on RHEL 6
ISSUE: Suddenly the drbd setup that I have deployed started to act weird.
Checking the status of the DRBD end points I got :
On primary site:
# cut /proc/drbd 0: cs:Connected ro:Primary/Secondary ds:UpToDate/Diskless A r----- ns:29921840 nr:0 dw:29920340 dr:182558889 al:5931 bm:806 lo:2 pe:0 ua:0 ap:2 ep:1 wo oos:29508
This is weird, this Primary seems to be OK and up to date, but the secondary site is in a Diskless status.
On secondary site:
# cut /proc/drbd 0: cs:Connected ro:Secondary/Primary ds:Diskless/UpToDate A r----- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
By looking at the above outputs one can conclude that:
– primary and secondary sites have no problem seeing each other. Both are in “Connected” status.
– Primary site is up to date
– Secondary site is in “Diskless” status.
First we can conclude that there is nothing wrong with the connection between sites and we are not in a “split-brain” situation, that is the usual case when I have some DRBD exceptional situation. Then I focused on the Diskless status.
Acording to the definition from the DRBD documentation:
Diskless. No local block device has been assigned to the DRBD driver. This may mean that the resource has never attached to its backing device, that it has been manually detached using drbdadm detach, or that it automatically detached after a lower-level I/O error.
Looking at the definition I concluded that the only possible case is the last one:
resource … automatically detached after a lower-level I/O error
So now that we know is a low level error I tried to investigate the lower levels.
As /dev/drbd0 device is defined on Secondary site over a LVM stack: lv_data logical volume defined over a vg_data volume.
# vgdisplay --- Volume group --- VG Name vg_data System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 12126 VG Access read/write VG Status resizable MAX LV 0 Cur LV 1 Open LV 0 Max PV 0 Cur PV 2 Act PV 2 VG Size 610.05 GiB PE Size 4.00 MiB Total PE 156172 Alloc PE / Size 111778 / 436.63 GiB Free PE / Size 44394 / 173.41 GiB VG UUID 5lIPqR-SLPT-rzTD-Kjmx-wmPB-GRLF-E1SePr
This looks OK.
# lvdisplay --- Logical volume --- LV Path /dev/vg_data/lv_data LV Name lv_data VG Name vg_data LV UUID sZmt11-8A0r-2cbv-Bqp9-jm33-jbdZ-g3Su71 LV Write Access read/write LV Creation host, time RTGS-CONTA, 2014-03-08 14:37:52 +0200 LV Status NOT available LV Size 436.63 GiB Current LE 111778 Segments 2 Allocation inherit Read ahead sectors auto
This is not OK. Obviously if lv _data is not active then /dev/drbd0 that is defined over /dev/vg_data/lv_data device cannot be initialized. This explains the “Diskless” status on the Secondary site.
Lets try to force activate lv_data
# lvchange -a y /dev/vg_data/lv_data
This did not work the result was:
# lvchange -a y vg_data/lv_data Couldn't find device with uuid 6v6PVy-z53P-0I0w-4gB9-lbCH-F1pi-hWtFIc. Refusing activation of partial LV lv_data. Use --partial to override.
At this point we know that lv_data cannot be activated because some lower level device over which lv_data is defined is not found.
The suspect at this point is one of the partitions over which vg_data is defined. I forgot to mention that vg_data extends over two partitions defined over two multipath ISCSI targets.
Checking the existing devices on the server I got:
# blkid /dev/sda1: UUID="71212636-f936-447f-b53e-d4747e0898d2" TYPE="ext4" /dev/sda2: UUID="BQ4k55-5xS9-XSKu-suNM-eqtF-HWdP-KFmTnm" TYPE="LVM2_member" /dev/sdb1: LABEL="VID" UUID="5201-7CA7" TYPE="vfat" /dev/mapper/mpathlp1: UUID="DsZslQ-xcm5-JNtJ-GSBs-ex0a-Fcye-01Qp1f" TYPE="LVM2_member"
This is not a good sign I can see only one mpath type partition. There should be two, an old saved output looking as:
# blkid /dev/sda1: UUID="71212636-f936-447f-b53e-d4747e0898d2" TYPE="ext4" /dev/sda2: UUID="BQ4k55-5xS9-XSKu-suNM-eqtF-HWdP-KFmTnm" TYPE="LVM2_member" /dev/sdb1: LABEL="VID" UUID="5201-7CA7" TYPE="vfat" /dev/mapper/mpathlp1: UUID="DsZslQ-xcm5-JNtJ-GSBs-ex0a-Fcye-01Qp1f" TYPE="LVM2_member" /dev/mapper/mpathop1: UUID="6v6PVy-z53P-0I0w-4gB9-lbCH-F1pi-hWtFIc" TYPE="LVM2_member"
At this point of the investigation we know that DRBD is Diskless on the Secondary site because underlining lv_data logical volume defined in vg_data volume is not active because one of the partitions over which vg_data is extended is not visible.
The only explanation at this point is that the kernel does not have the latest partition table. So lets try a partprobe to refresh it.
# partprobe Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy). As a result, it may not reflect all of your changes until after reboot. Warning: Unable to open /dev/sdb read-write (Read-only file system). /dev/sdb has been opened read-only. Warning: Unable to open /dev/sdb read-write (Read-only file system). /dev/sdb has been opened read-only. Warning: Unable to open /dev/sdb read-write (Read-only file system). /dev/sdb has been opened read-only. device-mapper: remove ioctl on mpathop1 failed: Device or resource busy Warning: parted was unable to re-read the partition table on /dev/mapper/mpatho (Device or resource busy). This means Linux won't know anything about the modifications you made. device-mapper: create ioctl on mpathop1 failed: Device or resource busy device-mapper: remove ioctl on mpathop1 failed: Device or resource busy
OK so the above confirms that there is a problem in refreshing the partition table with the missing partition.
The simple obvious solution is a reboot of the server. What caused the issues ? There are lost of cases but I never actually discovered which one was.
After a reboot of the server /dev/mapper/mpathop1 appeared and lv_data was in active state.
Suddenly the status of the DRBD end points changed and I can see on the DRBD Secondary node that the replication is started.
# service drbd status drbd driver loaded OK; device status: version: 8.4.4 (api:1/proto:86-101) GIT-hash: 599f286440bd633d15d5ff985204aff4bccffadd build by phil@Build64R6, 2013-10-14 15:33:06 m:res cs ro ds p mounted fstype 0:repdata SyncTarget Secondary/Primary Inconsistent/UpToDate A ... sync'ed: 0.1% (344784/344792)M
conclusion: DRBD is tricky and with lots of personality but is not always its fault. It is very important that all the underlining levels of the storage cake are also OK.