How-to-store-data-forever.md - drewdevault.com - [mirror] blog and personal website of Drew DeVault

How-to-store-data-forever.md (11675B)

---
date: 2020-04-22
title: How to store data forever
layout: post
---
As someone who has been often maligned by the disappearance of my data for
various reasons &mdash; companies going under, hard drive failure, etc &mdash;
and as someone who is responsible for the safekeeping of other people's data,
I've put a lot of thought into solutions for long-term data retention.
There are two kinds of long-term storage, with different concerns: cold storage
and hot storage. The former is like a hard drive in your safe &mdash; it stores
your data, but you're not actively using it or putting wear on the storage
medium. By contrast, hot storage is storage which is available immediately and
undergoing frequent reads and writes.
## What storage medium to use?
There are some bad ways to do it. The worst way I can think of is to store
it on a microSD card. These fail *a lot*. I couldn't find any hard data, but
anecdotally, 4 out of 5 microSD cards I've used have experienced failures
resulting in permanent data loss. Low volume writes, such as from a digital
camera, are unlikely to cause failure. However, microSD cards have a tendency to
get hot with prolonged writes, and they'll quickly leave their safe operating
temperature and start to accumulate damage. Nearly all microSD cards will let
you perform writes fast enough to drive up the temperature beyond the operating
limits &mdash; after all, writes per second is a marketable feature &mdash; so
if you want to safely move lots of data onto or off of a microSD card, you need
to monitor the temperature and throttle your read/write operations.
A more reliable solution is to store the data on a hard drive[^1]. However, hard
drives are rated for a limited number of read/write cycles, and can be expected
to fail eventually. Backblaze publishes some great articles on [hard drive
failure rates](https://www.backblaze.com/blog/hard-drive-stats-for-2019/) across
their fleet. According to them, the average annual failure rate of hard drives
is almost 2%. Of course, the exact rate will vary with the frequency of use and
storage conditions. Even in cold storage, the shelf life of a magnetic platter
is not indefinite.
[^1]: Or SSDs, which I will refer to interchangeably with HDDs in this article. They have their own considerations, but we'll get to that.
There are other solutions, like optical media, tape drives, or more novel
mediums, like the [Rosetta Disk](https://en.wikipedia.org/wiki/Rosetta_Project).
For most readers, a hard drive will be the best balance of practical and
reliable. For serious long-term storage, if expense isn't a concern, I would
also recommend hot storage over cold storage because it introduces the
possibility of active monitoring.
## Redundancy with RAID
One solution to this is redundancy &mdash; storing the same data across multiple
hard drives. For cold storage, this is often as simple as copying the data onto
a second hard drive, like an external backup HDD. Other solutions exist for hot
storage. The most common standard is [RAID][RAID], which offers different
features with different numbers of hard drives. With two hard drives (RAID1), for
example, it utilizes mirroring, which writes the same data to both disks. RAID
gets more creative with three or more hard drives, utilizing *parity*, which
allows it to reconstruct the contents of failed hard drives from still-online
drives. The basic idea relies on the XOR operation. Let's say you write the
following byte to drive A: `0b11100111`, and to drive B: `0b10101100`. By XORing
these values together:
[RAID]: https://en.wikipedia.org/wiki/RAID
11100111 A
^ 10101100 B
= 01001011 C
We obtain the value to write to drive C. If any of these three drives fail, we
can XOR the remaining two values again to obtain the third.
11100111 A
^ 01001011 C
= 10101100 B
10101100 B
^ 01001011 C
= 11100111 A
This allows any drive to fail while still being able to recover its contents,
and the recovery can be performed online. However, it's often not that simple.
Drive failure can dramatically reduce the performance of the array while it's
being rebuilt &mdash; the disks are going to be seeking constantly to find the
parity data to rebuild the failed disk, and any attempts to read from the disk
that's being rebuilt will require computing the recovered value on the fly. This
can be improved upon by using lots of drives and multiple levels of redundancy,
but it is still likely to have an impact on the availability of your data if not
carefully planned for.
You should also be monitoring your drives and preparing for their failure in
advance. Failing disks can show signs of it in advance &mdash; degraded
performance, or via S.M.A.R.T reports. Learn the tools for monitoring your
storage medium, such as smartmontools, and set it up to report failures to you
(and *test* the mechanisms by which the failures are reported to you).
### Other RAID failure modes
There are other common ways a RAID can fail that result in permanent data loss.
One example is using hardware RAID &mdash; there was an argument to be made for
them at one point, but these days hardware RAID is *almost always* a mistake.
Most operating systems have software RAID implementations which can achieve the
same results without a dedicated RAID card. With hardware RAID, if the RAID card
itself fails (and they often do), you might have to find the exact same card to
be able to read from your disks again. You'll be paying for new hardware, which
might be expensive or out of production, and waiting for it to arrive before you
can start recovering data. With software RAID, the hard drives are portable
between machines and you can always interpret the data with general purpose
software.
Another common failure is *cascading* drive failures. RAID can tolerate partial
drive failure thanks to parity and mirroring, but if the failures start to pile
up, you can suffer permanent data loss. Many a sad administrator has been in
panic mode, recovering a RAID from a disk failure, and at their lowest
moment... another disk fails. Then another. They've suddenly lost their data,
and the challenge of recovering what remains has become ten times harder. When
you've been distributing read and write operations consistently across all of
your drives over the lifetime of the hardware, they've been receiving a similar
level of wear, and failing together is not uncommon.
Often, failures like this can be attributed to using many hard drives from the
same batch. One strategy I recommend to avoid this scenario is to use drives
from a mix of vendors, model numbers, and so on. Using a RAID improves
performance by distributing reads and writes across drives, using the time one
drive is busy to utilize an alternate. Accordingly, any differences in the
performance characteristics of different kinds of drives will be smoothed out in
the wash.
## ZFS
RAID is complicated, and getting it right is difficult. You don't want to wait
until your drives are failing to learn about a gap in your understanding of
RAID. For this reason, I recommend ZFS to most. It automatically makes good
decisions for you with respect to mirroring and parity, and gracefully handles
rebuilds, sudden power loss, and other failures. It also has features which are
helpful for other failure modes, like snapshots.
Set up Zed to email you reports from ZFS. Zed has a debug mode, which will send
you emails even for working disks &mdash; I recommend leaving this on, so that
their conspicuous absence might alert you to a problem with the monitoring
mechanism. Set up a cronjob to do monthly scrubs and review the Zed reports when
they arrive. ZFS snapshots are cheap - set up a cronjob to take one every 5
minutes, perhaps with [zfs-auto-snapshot][zfs-auto].
[zfs-auto]: https://github.com/zfsonlinux/zfs-auto-snapshot
## Human failures and existential threats
Even if you've addressed hardware failure, you're not done yet. There are other
ways still in which your storage may fail. Maybe your server fans fail and burn
out all of your hard drives at once. Or, your datacenter could suffer a total
existence failure &mdash; what if a fire burns down the building?
There's also the problem of human failure. What if you accidentally `rm -rf / *`
the server? Your RAID array will faithfully remove the data from all of the hard
drives for you. What if you send the sysop out to the datacenter to decommission
a machine, and no one notices that they decommissioned the wrong one until it's
too late?
This is where off-site backups come into play. For this purpose, I recommend
[Borg backup][borg]. It has sophisticated features for compression and
encryption, and allows you to mount any version of your backups as a filesystem
to recover the data from. Set this up on a cronjob as well for as frequently as
you feel the need to make backups, and send them off-site to another location,
which itself should have storage facilities following the rest of the
recommendations from this article. Set up another cronjob to run `borg check`
and send you the results on a schedule, so that their conspicuous absence may
indicate that something fishy is going on. I also use [Prometheus][prom] with
[Pushgateway][pushgateway] to make a note every time that a backup is run, and
set up an alarm which goes off if the backup age exceeds 48 hours. I also have
periodic test alarms, so that the alert manager's own failures are noticed.
[borg]: https://www.borgbackup.org/
[prom]: https://prometheus.io/
[pushgateway]: https://github.com/prometheus/pushgateway
## Are you prepared for the failure?
When your disks are failing and everything is on fire and the sky is falling,
this is the worst time to be your first rodeo. You should have *practiced* these
problems before they became problems. Do training with anyone expected to deal
with failures. Yank out a hard drive and tell them to fix it. Have someone in
sales come yell at them partway through because the website is unbearably slow
while the RAID is rebuilding and the company is losing $100 per minute as a
result of the outage.
Periodically produce a working system from your backups. This proves (1) the
backups are still working, (2) the backups have coverage over everything which
would need to be restored, and (3) you know how to restore them. Bonus: if
you're confident in your backups, you should be able to replace the production
system with the restored one and allow service to continue as normal.
## Actually storing data *forever*
Let's say you've managed to keep your data around. But will you still know how
to interpret that data in the future? Is it in a file format which requires
specialized software to use? Will that software still be relevant in the future?
Is that software open-source, so you can update it yourself? Will it still
compile and run correctly on newer operating systems and hardware? Will the
storage medium still be compatible with new computers?
Who is going to be around to watch the monitoring systems you've put in place?
Who's going to replace the failing hard drives after you're gone? How will they
be paid? Will the dataset still be comprehensible after 500 years of evolution
of written language? The dataset requires constant maintenance to remain intact,
but also to remain useful.
And ultimately, there is one factor to long-term data retention that you cannot
control: future generations will decide what data is worth keeping &mdash; not
us.
In summary: no matter what, definitely don't do this:
![Picture of a SATA card for RAIDing 10 microSD cards together](https://redacted.moe/f/802d6207.jpg)