Adventures in LVM2
I’ve been an old-school die-hard gimme-those-old-time-primary-partitions kind of geek for a long time now. Even extended partitions bother me, especially when partial drive failure makes otherwise unaffected partitions disappear. So while the flexibility that LVM provides was enticing, I was frankly a little uncomfortable putting LVM on my system, especially after a hard drive crash.
Some months ago (I really don’t remember exactly when), I experienced a third drive failure on my old PATA-backed RAID1 array on my home system. I then discovered that Linux software raid has the ability to grow the device, and so I switched to larger SATA devices merely by adding the SATA devices to the RAID1, waiting for resync, removing the remaining functional PATA device, and then growing the new RAID1. Impressive and convenient, all except that installing grub by hand on Linux software RAID1 is still not an unadulterated pleasure.
While recovering from my most recent hard drive crash onto a temporary drive, I set up some filesystems on LVM, I seem to recall because at least one of the filesystems was likely to grow to require some unknown amount of disk space. I then ordered a larger drive to move my data onto. After the new drive arrived, I discovered the pvmove program and was very impressed that I should be able to add another physical volume to a volume group, then cause all the logical volumes in that group to be migrated from the temporary drive to the new physical volume. Google then helped me interpret the error message that I got, and modprobe dm-mirror got pvmove working. As I watched the output of pvmove -v /dev/hdc4 scroll past, telling me percent complete every 15 seconds and occasionally informing me that it had checkpointed progress in case my system were to halt before progress was finished, I wandered around the office telling everyone about how cool LVM was.
Not merely pride, but also enthusiasm goeth before a fall. I was in a hurry to remove /dev/hdc from my system and use only my new /dev/hda4 physical volume. Lost in a twisty maze of lvm utilities, all slightly different, I tried to remove the /dev/hdc4 physical volume. I ran pvremove /dev/hdc4 and had the command tell me that it wouldn’t succeed until I ran it with the -ff argument. By now, I was conditioned that LVM occasionally asks for confirmation of remotely-possibly-dangerous actions, and so just hit up-arrow, added -ff, and hit y to accept. Of course removing a physical volume is a little dangerous, but I knew I wanted to do it, and I had run pvdisplay first to ensure that the volume was empty. I removed the temporary drive and rebooted.
The LVM-savvy are shaking their heads now. Of course, I should have guessed that I would have to run vgreduce first. And, indeed, all the data that I had copied over – my home directory, my development environment, everything except the root partition – was now inaccessible, and /dev/hdc4 lived on as a phantom. The LVM commands complained about a missing uuid, and the solution wasn’t immediately obvious. “I lost my /home!” I cried, prompting anxious questions among my colleagues as to whether I had started to forward some of my more salacious spam to my wife. After about 5 minutes of careful (pvdisplay made it clear that space on my /dev/hda4 LVM partition was still reserved by something) experimentation (and no despair; I had just completed a current backup before starting the whole process), I resorted to reading the lvm man pages, one by one, until I understood the problem and the solution.
First of all, the fact that I couldn’t get at my data was (perversely enough) a good thing. It was because LVM was being conservative. It didn’t have all the physical volumes (as identified by UUIDs) required to complete the volume group, and so it was possible that the logical volumes were incompete. Trying to mount or otherwise manipulate a potentially incompete logical volume could be a disaster, so I’m glad LVM tried to protect me from my own stupidity. It was preserving my data.
The solution to the problem I had created was to run vgreduce (the same program that I should have run in the first place, had I only known) with the –removemissing option, because I knew that all data had been migrated off the physical volume I had removed from the system: vgreduce –removemissing vg0 rendered my data accessible again.