If it involves data loss, it’s happened to me:
- Corrupt file system / partition table / MBR
- Unrecoverable bad sectors
- Head crashes
- Bad drive electronics
- RAID rebuild failures
- RAID software/firmware bugs
- Bad power supplies in external enclosures
- Other uncategorized failures
In total, I’ve probably had 2 dozen drives fail on me. Desktop, laptop, external, SATA, PATA, every manufacturer you’ve heard of… apparently they don’t like me. Now the real problem with this is: I like my data. A lot. Which means every drive failure has resulted in a restore, some of which were more successful than others, and varied in time from hours to days. So, I’ve developed a ridiculous backup\restore policy, which I’m about to share so everyone else doesn’t need to learn from days of frustrating experience. You, right now – yes now! – need a backup. I don’t care how you create it. Stop reading this article and go backup. Right now. Seriously.
Back? Ok, now you can implement this battle hardened system!
- You need an off-site backup, for when a natural disaster strikes, or someone breaks in and runs off with your (encrypted) backups.
- You need continuous backups. A lot can happen in a week, and there’s no way you can remember all of it. Back up every day.
- On an average day, you need your data in three places. That way, when your computer dies, and you go to restore from backup A, but backup A is dead too, there’s still backup B.
- Backups probably need to be in a RAID or RAID-like set. Because unfortunately, hard drives aren’t as reliable as you’d like.
- You need a plan for when things go wrong. Otherwise, you won’t be thinking clearly (after all, your data is in danger), and you’ll try all sorts of stupid things. Trust me, I’ve done a lot of stupid stuff in the name of data recovery, and stupider stuff waiting for the restore to finish.
- It is worth spending < $1000 now so you don't have to spend several thousand later having someone professionally recover your data off the dead drive
- Use at least two programs to backup your data. Time Machine + Super Duper. Windows backup + Crashplan. Rsync + dropbox. Backing up in two different ways lowers the chances of corruption, accidentally failing to backup for the last week, and a number of other possible software problems.
- It’s not a backup policy unless you understand the procedure to restore.
The Backup Policy
My Backup System
Source System Destination Laptop -> Crashplan -> Crashplan Central (AKA "the cloud") -> Desktop internal HD -> Friend's NAS -> Time Machine -> External RAID 1 Enclosure -> Super Duper -> External RAID 1 Enclosure
5 devices, 6 disks, 3 methods. Most systems update hourly, although some are daily. The most recent restore was last August, before Super Duper was added to the system, and my Time Machine drive was being reformatted to make more free space. Crashplan to the rescue!
All the services I have used to backup and successfully restore data (every system I use has been tested, and not by choice):
- Crashplan (local, friend, and online. Free and cross-platform :D)
- Time Machine (Restore directly from backup never works, Install + Migration Assistant does. Filevault 1 + Time Machine is a bad idea. If you want filevault, upgrade to Lion.)
- Super Duper (Hands down the easiest system I’ve ever used for backups and restores.)
- Windows Backup (I know, it’s Microsoft, what do you expect. But it works!)
- rsync (In various configurations on cron jobs. It’s the most painful system I’ve used, but it gets the job done.)
A note about RAID
I know I just got done telling you to use RAID. I know there’s a footnote telling you that you really REALLY ought to be using RAID, but I have to be upfront about this.
RAID has never solved any problem I’ve had.
There. I’m sorry I can’t be more helpful, but it’s true. Don’t ask me why it’s so hard to copy bits, but apparently it is. I’ve used external RAID enclosures from Lacie and Western Digital. The Lacie 2big Quadra went back 3 times in the course of 2 years before I gave up on it and got a Western Digital My Book Studio II. 1 year later, it’s been back once. We’ll see if the future is any brighter. And then there was the GIGABYTE GA-EP45T-UD3LR LGA 775 Intel P45 ATX Intel Motherboard, which ostensibly supports RAID 0/1/5/10 via the Intel P45 southbridge. The RAID 1 set failed monthly, and that system lasted about three months. Windows installs just aren’t that fun, although Windows’ backup was better than expected.
I work with someone who has a Drobo. Which rebuilds weekly. And the throughput when it does work is atrocious. The problem Drobo solves is a real problem, they (along with everyone else) just don’t have a solution. Find me a RAID controller/enclosure from a reputable source (newegg, amazon, etc) with more than 4 stars. Got it? Now, do more than 75% of the people rate it 5 stars? No? Go take a moment to read the reviews on this product you found. Keep track of the number of sysadmins telling you it’s crap. Now take a look at the price again. Do you feel good about this purchase? Because I don’t. In fairness, I’d buy this tower from Apple, but it really needs to connect to my desktop computer… I’m hoping when I have cash coincides with the release of thunderbolt PCI cards, because those towers do not come cheap.
If you find yourself in this unfortunate situation, might I recommend testdisk. It works with GUID partition tables, and it managed to recover a partition I accidentally deleted in parted (for the record, whoever designed a partition editor where all the changes are applied instantly should be taken out and shot. fdisk got this right, but it doesn’t support GUID tables yet). Sadly, I lost another partition in the process, but it was a backup partition and I ended up just recreating it. The moral being, pay lots of attention to what’s going on and know exactly what you’re doing when you start pressing keys.
Unfortunately, RAID isn’t as reliable as you’d like either, but:
“User data: I recommend RAID1 or higher for user data just because it is so inexpensive that not doing it is embarrassing. By the way… you do know that RAID6 is the minimum for 2T disks and larger, right? It is professionally negligent to use RAID5 on such disks. RAID6 or RAID10 is the minimum; at least for now, but I digress.” – http://everythingsysadmin.com/
Also, see here and here
CS Majors should never be without a working computer. If all their computers are down, they will come up with creative and rarely intelligent alternative activities. If you come across such a poor soul, feel bad for them very briefly, and take lots of pictures.
Since I started writing this article, he’s stopped using it. RAID for the fail again.