May 12, 2009

LVM to the rescue

Scenario. Have a system with a dying raid. 2 drives (R1), In  raid1, one drive has a device error, the other is listed as degraded. Rebuild halted at some 80%, won't continue due to the device error on drive 1. Can't rebuild from drive 2 because it contains a degraded raid image, and 3ware is picky about stuff like that.

 We can get the data off the raid, it's working for that for now. 

Now, the raid device is used in an LVM volume group, (vg00) and there's another raid (R2) on the box, using 8 drives, that has about 1TB of info on. All slots are full. What to do? 



 Remove one of the second raid drives, (thus putting R2 into a degraded state) put spare drive for R1 in that slot. Remove the *degraded* drive from the first array, and create a new array with the new drive, and the degraded drive.

Now, add that new raid (R3) to vg00 by creating a physical volume on that raid, and adding it to vg00. Next is the cool bit, using pvmove, we push the data off of R1 onto R3, then we remove R1 from the volume group, and can delete that raid, and remove the drive, then replace the drive we pulled out from R2 above

All data migrated, no data loss, no service interruption. I love lvm

Couple of caveats, R3 has to be able to hold all the data that was on R1, if something happens to R1 during this time, you are in trouble (but you would be anyway, since the raid is degraded) and you have deliberately degraded R2, which could bite you in the nether regions if murphy hates you.  You could do this with an external USB drive, or any other datastore, a SAN, etc, but we didn't have that option in this case. 


 Probably other ways we could have handled this, but this worked well.