iSCSI SAN resources configuration as shared disks in a cluster environment, replicated also with DRBD

By | December 8, 2015

The following configuration was created to use the iSCSI exported resources from a NetApp storage device as shared disk resources in a Linux cluster environment. The same resources are also replicated to a remote cluster using DRBD disk replication.

The production environment contains four exported SAN resources which will be used as shared resources in the cluster setup. Some initial settings must be performed by the administrators in order to allow the cluster nodes of the production cluster to see the following resources.

Because we are working in a cluster environment the shared disk resources must be created with this in mind.

STEP 1: Discover iSCSI resources

We have to discover the iscsi resources exported by NETAPP storage device.

Make sure the iscsid service is running


# service iscsid start
# chkconfig iscsid on

Discover the resources exported by the NETAPP by using the iscsiadm tool. The iscsiadm utility is a command-line tool allowing discovery and login to iSCSI targets

# iscsiadm -m discovery -t st -p 10.5.0.2

To delete a discovered path

iscsiadm -m discovery --portal "10.5.0.2:3260" --op=delete

To list the discovered paths:

#iscsiadm -m node -l

IMPORTANT NOTES:

- change node.session.timeo.replacement_timeout to 5 in /etc/iscsi/iscsid.conf according to NetApp documentation (and in /var/lib/iscsi/nodes/iqn.test.com.netapp\:sn.84191318/10.5.0.2\,3260\,1002/iface[0,1])

- add
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.all.rp_filter = 2
to /etc/sysctl.conf to enable traffic on both network interfaces on the same subnet (eth0, eth1)

vi /etc/sysctl.conf
DEFAULT SETTINGS:
# grep '.rp_filter' /etc/sysctl.conf
net.ipv4.conf.default.rp_filter = 1

REQUIRED SETTINGS:
# grep '.rp_filter' /etc/sysctl.conf
net.ipv4.conf.default.rp_filter = 2
net.ipv4.conf.all.rp_filter = 2

To make sure the new settings are accounted for execute:

#sysctl -p

To rediscover the iscsi paths issue again:

#iscsiadm -m discovery -t st -p 10.5.0.2

As a result two iscsi sessions are configured on both nodes, that are visible by issuing the following command:

# iscsiadm -m session

STEP2: Set the multipath devices

Install multipath support:

# yum install device-mapper-multipath

To configure the multipath resources use the command:

# mpathconf

As a result in /etc/multipath.conf the following entries should be present.

devices {
device {
vendor "NETAPP"
product "LUN"
path_grouping_policy group_by_prio
features "1 queue_if_no_path"
prio "ontap"
path_checker directio
path_selector "round-robin 0"
failback immediate
hardware_handler "0"
rr_weight uniform
rr_min_io 128
getuid_callout "/lib/udev/scsi_id -g -u -d /dev/%n"
}
}

blacklist {
devnode "sd[a]$"
}

IMPORTANT NOTE:
– comment out “devices” section in /etc/multipath.conf according to NetApp documentation for RHEL 6.3+

Add multipath as a service and start it:

#chkconfig --add multipathd;
# chkconfig multipathd on
# service multipathd start

The multipath service must be installed and configured such as the following devices are visible by both cluster nodes:

Execute the following command to see all the configured multipath devices.

# multipath -l

STEP3: Partition multipath devices

On each of the multipath devices create a primary partition of Linux type. This operations must be done only on one of the cluster nodes.

1. In a server command line console execute as root the commands:

fdisk /dev/mapper/mpathb
fdisk /dev/mapper/mpathc
fdisk /dev/mapper/mpathd
fdisk /dev/mapper/mpathe

2. At the fdisk prompt press ‘n’ to create a new partition
3. select 1 for partition number
4. select as primary partition
5. create partition to span the entire device space
6. At the fdisk prompt press ‘t’ to set partition type
7. select the newly created partition to have type 83 (Linux partition)
8. At the fdisk prompt press ‘w’ to write changes

Do the same steps for each of the following devices:
/dev/mapper/mpathb
/dev/mapper/mpathc
/dev/mapper/mpathd
/dev/mapper/mpathe

As a result the following devices corresponding to the newly created partitions will be available:
/dev/mapper/mpathbp1
/dev/mapper/mpathcp1
/dev/mapper/mpathdp1
/dev/mapper/mpathep1

STEP4: Create the logical volumes of the environments

To ensure that the logical volumes will be created as HA LVM (high availability logical volume) before starting their creation we must ensure that the clustered logical volume manager is running on both nodes of the cluster.
On both nodes set that this service is always started:

service clvmd start
chkconfig clvmd on

The following settings will be executed only on one of the cluster nodes. All the settings will be automatically replicated to the other node by clvmd.

On each of the multipath device partitions previously created a logical volume structure is going to be created and assigned to one of the application or database environments. This operations must be done only on one of the cluster nodes, with the exception of point 5.

Create the application environment data share:
1. Initialize /dev/mapper/mpathdp1 partition for use by LVM

pvcreate /dev/mapper/mpathdp1

2. Create the application data volume group

vgcreate vg_app_data /dev/mapper/mpathdp1

3. Create the application data logical volume. Allocate all the available space but leave some space for metadata.

lvcreate -n lv_app_data --size 29G vg_app_data

4. Format the application data logical volume

mkfs.ext4 /dev/vg_app_data/lv_app_data

5. Create the mount point of application data resource. Do this on both nodes.

mkdir /appdata

Create the application environment log share:
1. Initialize /dev/mapper/mpathep1 partition for use by LVM

pvcreate /dev/mapper/mpathep1

2. Create the application log volume group

vgcreate vg_app_log /dev/mapper/mpathep1

3. Create the application log logical volume. Allocate all the available space but leave some space for metadata.

lvcreate -n lv_app_log --size 99G vg_app_log

4. Format the application log logical volume

mkfs.ext4 /dev/vg_app_log/lv_app_log

5. Create the mount point of application log resource. Do this on both nodes.

mkdir /applog

Create the database environment data share:
1. Initialize /dev/mapper/mpathbp1 partition for use by LVM

pvcreate /dev/mapper/mpathbp1

2. Create the database data volume group

vgcreate vg_data /dev/mapper/mpathbp1

3. Create the database data logical volume. Allocate all the available space but leave some space for meta-data. We must leave also 1/3 of the space for the future replication temporary drive.

lvcreate -n lv_data --size 100G vg_data

4. Format the database data logical volume

mkfs.ext4 /dev/vg_data/lv_data

5. Create the mount point of database data resource. Do this on both nodes.

mkdir /data

Create the database environment log share:
1. Initialize /dev/mapper/mpathcp1 partition for use by LVM

pvcreate /dev/mapper/mpathcp1

2. Create the database log volume group

vgcreate vg_log /dev/mapper/mpathcp1

3. Create the database log logical volume. Allocate all the available space but leave some space for metadata.

lvcreate -n lv_log --size 59G vg_log

4. Format the database log logical volume

mkfs.ext4 /dev/vg_log/lv_log

5. Create the mount point of database data resource. Do this on both nodes.

mkdir /log

Start the cluster logical volume manager

service clvmd start

STEP5: Change HA-LVM from clustered LVM to LVM fail-over with tagging to accommodate DRBD

After we were made sure that all the logical volumes were identically created with the use of clvmd to ensure correct behaviour of DRBD replication we must change the type of high availability configuration. DRBD forces us to change the HA-LVM from clustered lvm to LVM failover with tagging.

The following steps must be executed on both nodes of the cluster.

disable clvmd
service clvmd stop
chkconfig clvmd off
change /etc/lvm.conf to locking type 1 (from 3) and volume_list to
volume_list = [ "vg_node1" , "@PRODA" ]
on first node and
volume_list = [ "vg_node2" , "@PRODB" ]
on the second node.
run mkinitrd -f /boot/initrd-$(uname -r).img $(uname -r) on each node before reboot
change logical volumes type from clustered to local using:
vgchange -cn vgname --config 'global {locking_type = 0}'

Where PRODA and PRODB are the names of the cluster nodes.


STEP 6: Update initrd images with the new lvm.conf

Note that in this case we changed the lvm.conf file and this is no longer matching the one from the initrd.
In case of a reboot of the node where we did the change (both nodes in this case) we are going to hit the “HA LVM: Improper setup detected” issue when we want to start any cluster service.
This isssue is described by RedHat Solution 21622
Basically the solution is to regenerate the initrd image for the current kernel on this node. This will add the new lvm.conf to the image.

Make a backup of the image:

Now rebuild the initramfs for the current kernel version:

Advertisements