This won’t be one of my usual horror stories. But this would be story of horrors that can happen to your data unless you pay attention.
Studies on hard drive reliability are extremely rare, most of information comes from manufacturers and data is synthetic – based on artificial tests and return rates. To fill the void Google Labs made exceptionally valuable study of hard drive failure rated based on their server farms.
It is known fact that hard drives are likely to fail in first year of operation. Study further narrows it down to first three months – that is critical time with high rate of failures.
Considering usual hard drive change scenario is transferring all information from old hard drive to freshly purchased – may be disaster recipe.
Instead of enjoying new pool of gigabytes right away it is better to:
- put new drive through initial stress and surface test (will cover tools for this in future posts);
- use new drive as secondary through first months;
- keep old drive as backup medium (unless it is being replaced for failure symptoms).
High temperature and workload are usually considered to be negative factors to drive lifetime. Counter-intuitively study shows that high temperature has little correlation with failure rates (except for old drives) and drives operating at (suspiciously) low temperatures of below 35C are more likely to fail.
Similarly with work load – as long as drive gets through first critical months further usage has little effect on failure probability.
So while frying hard drive in crappy case is never good idea other than that you might use it without being troubled about “overworking” it.
Failure indicators (or lack of any)
SMART technology is incorporated in all modern hard drives and serves as self-diagnostic module. It must be enabled in BIOS and indicators can be read (and optionally diagnosed) with various software such as SpeedFan or PC Wizard.
However while bad SMART values certainly indicate problem with drive, study concludes that large percentage of failing drives show no sign of SMART failing.
One of the most interesting parameters that study was unable to reliable access and process was drive vibration. There is very little data on how vibration affects drives in long term. It may be especially critical for drives that are mounted in soft ways in silent computers to reduce noise.
Manufacturers recommend hard drives mounting so sacrificing that for reduced noise is basically gamble without any real data on possible results.
There is also interesting topic of turning on/off stress. Server drives (on which study is based) run 24/7 minus repairs but that is rarely case for home and office computers.
While I don’t have direct experience working with huge quantity of hard drives I had certainly seen loads of those over years.
I can confirm early death syndrome (it makes perfect sense after all) and I was never believer in high load theories (seen and made some drives run for years in unfavorable conditions).
There is one additional factor that I wanted to share which Google couldn’t test. Server hardware is not moving anywhere but it’s clearly not uncommon for home desktops.
Physical factors such as blows, frequent transportation, moving while hard drive is active can have devastating effect on drive reliability. It is good idea to place computer in such way that it is unlikely to be disturbed.
It is also why I have little faith in external hard drives as backup medium – they are good for transportation but very same moving around makes them vulnerable to physical damage.
So main points would be:
- do not trust newly purchased drives until they’ve been though testing and some usage;
you can’t be sure to predict hard drive failure;
- always have backup strategy implemented, automated and periodically checked (I use Cobian Backup, SyncExp and Dropbox for mine).
PDF download http://labs.google.com/papers/disk_failures.pdf
How reliable do you think hard drives are? Do you trust them as storage medium or think that they are likely to fail?