☑ Just like old

5 Jun 2013 at 6:02PM in Software
 |   | 

If you have the luxury of migrating your Linux installation to a new hard disk before the old one packs up entirely, it’s quite easily done with standard tools.

two penguins

Recently on my Linux machine at work I started getting some concerning emails from smartd which looked like this:

Device: /dev/sda [SAT], 7 Offline uncorrectable sectors

Any errors from smartd are a cause for concern, but particularly this one. To explain, a short digression — anybody familiar with SMART and hard drives can skip the next three paragraphs.

Hard disks are split into sectors which are the smallest units which can be addressed1. Each sector corresponds to a tiny portion of the physical disk surface and any damage to the surface, such as scratches or particles of dust, may render one or more of these sectors inaccessible — these are often called bad sectors. This has been a problem since the earliest days of hard drives, so operating systems have been designed to cope with sectors that the disk reports as bad, avoiding their use for files.

Modern hard disk manufacturers have more or less accepted that some proportion of drives will have minor defects, so they reserve a small area of the disk for reallocated sectors, in addition to the stated capacity of the drive. When bad sectors are found, the drive’s firmware quietly relocates them to some of this spare space. The amount of space “wasted” by this approach is a tiny proportion of the space of the drive and saves manufacturers from having to deal with a steady stream of customers RMAing drives with a tiny proportion of bad sectors. There is a limit to the scope of this relocation, however, and when the spare space is exhausted the drive has no choice but to report the failures directly to the system2.

The net result of all this is that by the time your system is reporting bad sectors to you, your hard disk has probably already had quite a few physical defects crop up. The way hard drives work, this often means the drive may be starting to degrade and may suffer a catastrophic failure soon — this was confirmed by a large-scale study by Google a few years ago. So, by the time your operating system starts reporting disk errors, it may be just about too late to practically do anything about it. This is where SMART comes in — it’s a method of querying information from hard disks, including such items as the number of sectors which the drive has quietly reallocated for you.

The smartd daemon uses SMART to monitor your disks and watch for changes in the counters which may indicate a problem. Increases in the reallocated sector count should be watched carefully — occasionally these might be isolated instances, but if you see this number change continuously over a few days or weeks then you should assume the worst and plan for your hard disk to fail at any moment. The counter I mentioned above, the offline uncorrectable sector count, is even worse — this means that the drive encountered an error it couldn’t solve when reading or writing part of the disk. This is also a strong indicator of failure.

So, I know my hard disk is about to fail — what can I do about it? The instructions below cover my experiences on an Ubuntu 12.04 system, but the process should be similar for other distributions. Note that this is quite a low-level process which assumes a fair degree of confidence with Linux and is designed to duplicate exactly the same environment. You may find it easier to simply back up your important files and reinstall on to a fresh disk.

Since I use Linux, it turns out to be comparatively easy to migrate over to a new drive. The first step is to obtain a replacement hard disk, then power the system off, connect it up to a spare SATA socket and boot up again. At this point, you should be able to partition it with fdisk, presumably in the same way as your current drive but the only requirement is that each partition is at least big enough to hold all the current files in that part of your system. Once you’ve partitioned it, format the partitions with, for example, mke2fs and mkswap as appropriate. At this point, mount the non-swap partitions in the way that they’ll be mounted in the final system but under some root — for example, if you had just / and /home partitions then you might do:

sudo mkdir /mnt/newhdd
sudo mount /dev/sdb1 /mnt/newhdd
sudo mkdir /mnt/newhdd/home
sudo mount /dev/sdb5 /mnt/newhdd/home

Important: make sure you replace /dev/sdb with the actual device of your new disk. You can find this out using:

sudo lshw -class disk

… and looking at the logical name fields of the devices which are listed.

At this point you’re ready to start copying files over to the new hard disk. You can do this simply with rsync, but you have to provide the appropriate options to copy special files across and avoid copying external media and pseudo-filesystems:

rsync -aHAXvP --exclude="/mnt" --exclude="/lost+found" --exclude="/sys" \
      --exclude="/proc" --exclude="/run/shm" --exclude="/run/lock" / /mnt/newhdd/

You may wish to exclude other directories too — I suggest running mount and excluding anything else which is mounted with tmpfs, for example.

You can leave this running in the background — it’s likely to take quite a long time for a system which has been running for awhile, and it also might impact your system’s performance somewhat. It’s quite safe to abort and re-run another time — the rsync with those parameters will carry on exactly where it left off.

While this is going on, make sure you have an up-to-date rescue disk available, which you’ll need as part of the process. I happened to use the Ubuntu Rescue Remix CD, but any reasonable rescue or live CD is likely to work. It needs to have rsync, grub, the blkid utility and a text editor available and be able to mount all your filesystem types.

Once that command has finished, you then need to wait for a time where you’re ready to do the switch. Make sure you won’t need to be interrupted or use the PC for anything for at least half an hour. If you had to abandon the process and come back to it later, make sure you re-run the above rsync command just prior to doing the following — the idea is to make sure the two systems are as closely synchronised as possible.

When you’re ready to proceed, shut down the system, open it up and swap the two disks over. Strictly speaking you probably don’t need to swap them, but I like to keep my system disk as /dev/sda so it’s easier to remember. Just make sure you remember that they’re swapped now!

Now boot the system into the rescue CD you created earlier. Get to a shell prompt and mount your drives — let’s say that the new disk is now /dev/sda and the old one is /dev/sdb, continuing the two partition example from earlier, then you’d do something like this:

mkdir /mnt/newhdd /mnt/oldhdd
mount /dev/sda1 /mnt/newhdd
mount /dev/sdb1 /mnt/oldhdd
mount /dev/sda5 /mnt/newhdd/home
mount /dev/sdb5 /mnt/oldhdd/home

I’m assuming you’re already logged in as root — if not, you’ll need to use sudo or su as appropriate. This varies between rescue systems.

As you can see, the principle is to mount both old and new systems in the same way as they will be used. As this point you can then invoke something similar to the rsync from earlier, except with the source changed slightly. Note that you don’t need all those --exclude options any more because the only things mounted should be the ones you’ve manually mounted yourself, which are all the partitions you actually want to copy:

rsync -aHAXvP /mnt/oldhdd/ /mnt/newhdd/

Once this final rsync has finished, you’ll need to tweak a few things on your target drive before you can boot into it. After this point do not run rsync again or you will undo the changes you’re about to make.

First, you need to update /mnt/newhdd/etc/fstab to reflect your new hard disk. If you take a look, you’ll probably find that the lines for the standard partitions start like this:


What you need to do is replace these UUIDs with the ones from your new drive. You can find this out by running blkid which should output something like this:

/dev/sda1: UUID="b7299d50-8918-459f-9168-2a743f462658" TYPE="swap" 
/dev/sda2: LABEL="/" UUID="43f6065c-d141-4a64-afda-3e0763bbbc9a" TYPE="ext4"
/dev/sdb1: UUID="8affb32a-bb25-8fa2-8473-2adc443d1900" TYPE="swap"
/dev/sdb2: LABEL="/" UUID="d3964aa8-f237-4b34-814b-7176719b2e42" TYPE="ext4"

What you want is to copy the UUID fields for your new disk to fstab over the top of the old ones. Be careful not to accidentally copy a stray quote or similar.

The other thing you need to do is change the grub.conf file to refer to the new IDs. Typically this file is auto-generated, but you can use a simple search and replace to update the IDs in the old file. First grep the file for the UUID of the old root partition to make sure you’re changing the right thing:

grep d3964aa8-f237-4b34-814b-7176719b2e42 /mnt/newhdd/boot/grub/grub.cfg

Then replace it with the new one, with something like this:

cp /mnt/newhdd/boot/grub/grub.cfg /mnt/newhdd/boot/grub.cfg.orig
sed 's/d3964aa8-f237-4b34-814b-7176719b2e42/43f6065c-d141-4a64-afda-3e0763bbbc9a/g' \
    /mnt/newhdd/boot/grub.cfg.orig > /mnt/newhdd/boot/grub.cfg

As an aside, there’s probably a more graceful way of using update-grub to re-write the new configuration, but I found it a lot easier just to do the search and replace like this.

At this point you should install the grub bootloader on to the new disk’s boot sector:

grub-install --recheck --no-floppy --root=/mnt/newhdd /dev/sda

Finally, you should be ready to reboot. Cross your fingers!

If your system doesn’t come back up then I suggest you use the rescue CD to fix things. Also, since you haven’t actually written anything to the old disk, you should always be able to swap the disks back and try to figure out what went wrong.

At this point you should shut your system down again, remove the old disk entirely and try booting up again. If your system came back up before and fails now then it was probably booting off the old disk, which probably indicates the boot sector didn’t install on the new disk properly.

Hopefully that’s been of some help to someone — by all means leave a comment if you have any issues with it or you think I’ve made a mistake. Good luck!

  1. Typically they’re 512 bytes, although recently drives with larger sectors have started to crop up. 

  2. Strictly speaking there’s also a small performance penalty when accessing a reallocated sector on a drive, so they’re also bad news in performance-critical servers — typically this isn’t relevant to most people, however. 

5 Jun 2013 at 6:02PM in Software
 |   | 
Photo by Cara Fuller