Fedora Linux: Recover a corrupted system with a RAID1 boot partition with a lvm RAID1 root partition

By | February 18, 2016

There are times when even experimented sysadmins do stupid things. One of that moments happened to me when I tried some speed tests on my ssd RAID1.
The result was not so satisfactory. Corrupted /boot partition with missing files and orphan inodes.
To make things more complicated I have the following constraints that make this operation even harder:
– my boot partition is in fact an ext4 partition over a mdraid 1.2 Linux software RAID1.
– my root partition is an ext4 partition in a logical volume group (lvm) over a mdraid 1.2 Linux software RAID1.

The following is a result of two days of trial and error to get a correct procedure to recover the system without having to do any reinstall.

Step 1: Boot rescue system
Boot with a Fedora 23 Live disk. Lucky for me I have exactly for this situations a microSD card on the HP Microserver Gen8 motherboard with a Fedora 23 Live disk created with Fedora Live USB Creator.
This proved to be so many times a rescue solution for stupid things that happen that I recommend it to everybody.

Step 2: Mount the boot and root partition
This next step consists in mounting the root , special directories and the boot partition under the rescue system

First create a directory under /mnt:


mkdir /mnt/root

Mount the server root partition under /mnt/root. Note that my server root is a lvm partition.


mount /dev/mapper/fedora_localhost-root /mnt/root

Try to find out on which RAID1 the boot and root partitions are activated. I found out that over different boots my boot and root raids change between /dev/md15 or /dev/md126.


cat /proc/mdstat

I know that my boot partition is a 500MB RAID1 so with that in mind I know that /dev/md125 hold the boot

Mount the boot partition:


mount /dev/md125 /mnt/root/boot/

Bind the special directories:


mount -o bind /dev /mnt/root/dev
mount -o bind /proc /mnt/root/proc
mount -o bind /sys /mnt/root/sys
mount -o bind /run /mnt/root/run

Finally change the root of the system to the server root.


chroot /mnt/root

Step 3: Recreate the boot partition
Because my boot partition wa corrupted the best way to start over was to recreate the filesystem


mkfs.ext4 /dev/md125

To repopulate the boot partition with the necessary files reinstall the kernel packages , grub2 packages, fedora logos and shim.


dnf reinstall kernel* grub2 fedora-logos grub2-efi grub2-efi-modules shim

Step 4: Recreate the ram disk
Just in case I recreated the ram disk using dracut. Use -f to force the regeneration of the ram disk for the default just installed kernel.


dracut –f /boot/initramfs-2.6.32-358.el6.x86_64.img 2.6.32-358.el6.x86_64

STEP 5: Configure grub2

Edit by hand the file /etc/default/grub in case that file is no l;onger the same as you left it.
My configuration is the following:


GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_PRELOAD_MODULES="raid mdraid1x lvm lvm2"
GRUB_CMDLINE_LINUX="rd.lvm.lv=fedora_localhost/root $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/rhcrashkernel-param || :) console=ttyS1,115200n8 rd.auto rd.md.waitclean=1"
GRUB_DISABLE_RECOVERY="true"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"
GRUB_CMDLINE_LINUX_DEFAULT="video=1024x768"

Note the line GRUB_PRELOAD_MODULES=”raid mdraid1x lvm lvm2″. This will instruct grub to make sure to preload the “raid mdraid1x lvm lvm2” modules.

Note also the parameters under GRUB_CMDLINE_LINUX

rd.lvm.lv=fedora_localhost/root

This will signal grub to initialize the lvm root partition.


rd.auto

This will instruct grub to initialize raid systems


rd.md.waitclean=1

This will instruct grub to make sure to wait untill all the mdraid disks are initialized and reconstructed. This is very important because sometimes mdraid initialization can take longer than the grub initialization so we end up with grub trying to boot from a non-existing boot partition or to mount a non-existing root partition.

STEP 6: Install grub on MBR of raid disks

We have to install the grub2 loader on the MBR of both the disks that host the boot raid partition.


grub2-install –v --recheck /dev/sda --no-floppy
grub2-install –v --recheck /dev/sdb --no-floppy

Note the –recheck parameter. This is needed to ensure that all the correct resources are detected. Sometimes detection takes to long and we end up with a boot loader that does not see some lvm or raid resources.

Note the –no-floppy parametar. On some systems that do not have a floppy sometimes the loader is not correctly loaded without this parameter.

STEP 7: Generate the new grub.conf

The following will generate the new grub.conf file


grub2-mkconfig –o /boot/grub2/grub.conf

Make sure to also add by hand at each kernel target


insmod lvm

from a very strange reason I ended up with grub failing to pre load lvm module so grub not being able to mount the root lvm partition. This explicit insmod forced the load of lvm module again.

STEP 7: Reboot

Just reboot and hope for the best.

Advertisements