There are times when even experimented sysadmins do stupid things. One of that moments happened to me when I tried some speed tests on my ssd RAID1.
The result was not so satisfactory. Corrupted /boot partition with missing files and orphan inodes.
To make things more complicated I have the following constraints that make this operation even harder:
– my boot partition is in fact an ext4 partition over a mdraid 1.2 Linux software RAID1.
– my root partition is an ext4 partition in a logical volume group (lvm) over a mdraid 1.2 Linux software RAID1.
The following is a result of two days of trial and error to get a correct procedure to recover the system without having to do any reinstall.
Step 1: Boot rescue system
Boot with a Fedora 23 Live disk. Lucky for me I have exactly for this situations a microSD card on the HP Microserver Gen8 motherboard with a Fedora 23 Live disk created with Fedora Live USB Creator.
This proved to be so many times a rescue solution for stupid things that happen that I recommend it to everybody.
Step 2: Mount the boot and root partition
This next step consists in mounting the root , special directories and the boot partition under the rescue system
First create a directory under /mnt:
Mount the server root partition under /mnt/root. Note that my server root is a lvm partition.
mount /dev/mapper/fedora_localhost-root /mnt/root
Try to find out on which RAID1 the boot and root partitions are activated. I found out that over different boots my boot and root raids change between /dev/md15 or /dev/md126.
I know that my boot partition is a 500MB RAID1 so with that in mind I know that /dev/md125 hold the boot
Mount the boot partition:
mount /dev/md125 /mnt/root/boot/
Bind the special directories:
mount -o bind /dev /mnt/root/dev
mount -o bind /proc /mnt/root/proc
mount -o bind /sys /mnt/root/sys
mount -o bind /run /mnt/root/run
Finally change the root of the system to the server root.
Step 3: Recreate the boot partition
Because my boot partition wa corrupted the best way to start over was to recreate the filesystem
To repopulate the boot partition with the necessary files reinstall the kernel packages , grub2 packages, fedora logos and shim.
dnf reinstall kernel* grub2 fedora-logos grub2-efi grub2-efi-modules shim
Step 4: Recreate the ram disk
Just in case I recreated the ram disk using dracut. Use -f to force the regeneration of the ram disk for the default just installed kernel.
dracut –f /boot/initramfs-2.6.32-358.el6.x86_64.img 2.6.32-358.el6.x86_64
STEP 5: Configure grub2
Edit by hand the file /etc/default/grub in case that file is no l;onger the same as you left it.
My configuration is the following:
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_PRELOAD_MODULES="raid mdraid1x lvm lvm2"
GRUB_CMDLINE_LINUX="rd.lvm.lv=fedora_localhost/root $([ -x /usr/sbin/rhcrashkernel-param ] && /usr/sbin/rhcrashkernel-param || :) console=ttyS1,115200n8 rd.auto rd.md.waitclean=1"
GRUB_SERIAL_COMMAND="serial --speed=115200 --unit=0 --word=8 --parity=no --stop=1"
Note the line GRUB_PRELOAD_MODULES=”raid mdraid1x lvm lvm2″. This will instruct grub to make sure to preload the “raid mdraid1x lvm lvm2” modules.
Note also the parameters under GRUB_CMDLINE_LINUX
This will signal grub to initialize the lvm root partition.
This will instruct grub to initialize raid systems
This will instruct grub to make sure to wait untill all the mdraid disks are initialized and reconstructed. This is very important because sometimes mdraid initialization can take longer than the grub initialization so we end up with grub trying to boot from a non-existing boot partition or to mount a non-existing root partition.
STEP 6: Install grub on MBR of raid disks
We have to install the grub2 loader on the MBR of both the disks that host the boot raid partition.
grub2-install –v --recheck /dev/sda --no-floppy
grub2-install –v --recheck /dev/sdb --no-floppy
Note the –recheck parameter. This is needed to ensure that all the correct resources are detected. Sometimes detection takes to long and we end up with a boot loader that does not see some lvm or raid resources.
Note the –no-floppy parametar. On some systems that do not have a floppy sometimes the loader is not correctly loaded without this parameter.
STEP 7: Generate the new grub.conf
The following will generate the new grub.conf file
grub2-mkconfig –o /boot/grub2/grub.conf
Make sure to also add by hand at each kernel target
from a very strange reason I ended up with grub failing to pre load lvm module so grub not being able to mount the root lvm partition. This explicit insmod forced the load of lvm module again.
STEP 7: Reboot
Just reboot and hope for the best.