Recover Software Raid1 after HDD failure on Debian Squeeze

Starting configuration: HP Proliant ML150 G3 server.

  • 2 x 500 GB SATA drives (Seagate ...) /dev/sda and /dev/sdb.
  • 2 raid1 device - /dev/md0 (made of /dev/sda1 and /dev/sdb1) for system /dev/md1 for swap

One of the hdd crashed physically (/dev/sda). I wanted to upgrade to better and bigger hdd and got 2x 1TB Samsung.

I replaced the faulty disk and I booted the system from /dev/sdb  (needed BIOS boot device reconfiguration) and then

  1. Clean the new HDD of all existing partitions (already clean since was new but this step should be considered if using a different hdd)
  2. Clean the MBR with: # dd if=/dev/zero of=/dev/sda bs=512 count=1
  3. Save partition scheme from good disk with : # sfdisk -d /dev/sdb > mbr_sdb.txt
  4. Apply the scheme to new hdd: # sfdisk /dev/sdb < mbr_sdb.txt
  5. Add the partitions of new disk to the raid: # mdadm /dev/md0 -a /dev/sda1
  6. Raid recovery starts and you check the status with: # mdadm --detail /dev/md0 (the same command for md1)
  7. installed grub on the new disk with: # grub-install /dev/sda

Then I removed the /dev/sdb (still functional but wanted 2 identical/better drives) with a new one and repeated the process.

Finally I wanted to install grub on the new disk (being /dev/sdb now) and it failed with a message that sai:

# grub-install /dev/sdb
/usr/sbin/grub-probe: error: no such disk.
Auto-detection of a file system of /dev/md0 failed.
Please report this together with the output of "/usr/sbin/grub-probe --device-map=/boot/grub/device.map --target=fs -v /boot/grub" to <bug-grub@gnu.org>

After digging on net I found and resolved the problem. The cause was the /boot/grub/device.map file content was not reflecting the reality of the disks.

The file had this content which show the old disks.

# cat /boot/grub/device.map
(hd0)    /dev/disk/by-id/ata-ST3500320NS_9QM263BS
(hd1)    /dev/disk/by-id/ata-ST3500630NS_9QG8J1M5
(hd2)    /dev/disk/by-id/ata-ST3500320NS_9QM1BN3X

So, we need to update that file and that can be done by grub-install by adding the parametter --recheck (installing the grub in the same time)

root@wdev:/boot/grub# grub-install --recheck /dev/sdb

Installation finished. No error reported.

# cat /boot/grub/device.map
(hd0)    /dev/disk/by-id/ata-SAMSUNG_HD103SJ_S246J9KB416143
(hd1)    /dev/disk/by-id/ata-SAMSUNG_HD103SJ_S246J9KB416142

Watch raid sync progress with :

# watch -n 2 cat /proc/mdstat

Create a new raid1 device md2 with the remaining 500GB (I created new partitions /dev/sda3 and sdb3 to add to new array)

Create the array with:

# mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3

Format the new partition ext4:

# mkfs.ext4 /dev/md2

Update the /etc/mdadm/mdadm.conf content adding the new array or regenerate its content using

# /usr/share/mdadm/mkconf > /etc/mdadm/mdadm.conf

Related content