I have a RHEL 6 Linux cluster setup where the main database storage is replicated on a remote site using DRBD.
In case of a failure of the primary site the decision is taken to move operations on the remote site.
When the production site is back online sometimes the administrator is too eager to start the operations on the main site.
In case the reconnected sites do not have time to resynchronize (the remote site will automatically send data to the production site when is recovered) or they are interrupted during the resynchronization, when atempting to start the Database service on the production site, service that in turn will bring up a DRBD endpoint resource DBDataLVM, the following issue occurs:
PRODB rgmanager: [lvm] HA LVM requires Only one logical volume per volume group.
PRODB rgmanager: [lvm] There are currently 2 logical volumes in vg_data
PRODB rgmanager: [lvm] Failing HA LVM start of vg_data/lv_data
PRODB rgmanager: start on lvm "DBDataLVM" returned 1 (generic error)
PRODB rgmanager: #68: Failed to start service:Database_Service; return value: 1
The HA LVM complains that there are two logical volumes in vg_data.
This is true because DRBD synchronization process is creating a temporary lv_data_… in the same vg_data volume where our HA LVM lv_data is placed. Remember that when configuring DRBD replication we always allocate just at most 70% of a vg_data volume under which our lv_data logical volume that needs to be replicated is created.
This temporary lv is always created during the synchronization process and is removed when both DRBD sites are up to date.
In case the synchronization between sites is still going on or was interrupted the temporary lv will be there and the above issue will occur.
The solution is to start both DRBD end points and let the resynchronization process finish.