Production backup strategy for Linux servers

Key takeaways

01 3-2-1 means three copies, on two different media types, with one off-site. The dogma is less important than the underlying intent: no single failure mode can destroy all your copies.
02 restic is the right default for most operators in 2026. Incremental, deduplicated, encrypted by default, speaks S3 and most cloud storage directly.
03 Database backups need application-aware tooling. A file-system snapshot of a running database is sometimes useful, sometimes a corrupted file. Use mysqldump, pg_dump, or the database-native backup mechanism instead.
04 Encryption keys are part of the backup. A backup encrypted with a key only you remember in your head is a backup waiting to be lost.
05 A backup you have never restored is a hypothesis. The restore drill is the part that turns it into a backup. Do one quarterly.
06 Object storage with versioning and object lock is the most cost-effective off-site destination for almost every small operator. Pair it with strict retention rules.

Most backup strategies are written down and then ignored until the day they matter. This guide is the reference an engineer wishes their predecessor had read: the 3-2-1 rule explained without dogma, the practical differences between restic, borg, rsync and rclone, how to handle databases and stateful applications, what to do about encryption keys, and the restore drill that distinguishes a backup from a hypothesis.

The 3-2-1 rule, explained without dogma

The 3-2-1 rule is the closest thing the backup world has to received wisdom. Three copies of the data, on two different media types, with one copy off-site. It has been repeated so often that it sounds like a slogan, but the intent underneath is real: no single failure can destroy all your copies.

The three copies are usually the production data, a local backup (on different storage than production), and an off-site backup (on different storage than the local backup, in a different physical location). If the production disk fails, the local backup covers you. If the building burns down, the off-site covers you. If your cloud provider has an outage, the local covers you. If both fail in the same week, you have other problems but you have your data.

The "two different media types" requirement was more meaningful when the alternatives were "spinning disk" and "tape." In 2026 the spirit of the rule is "two technologies whose failure modes are uncorrelated." Local NVMe and a cloud bucket count. Two NVMe drives in the same chassis do not.

The "off-site" copy is the one people get wrong most often. A USB drive in the drawer next to the server is not off-site. A backup on a separate VM at the same cloud provider in the same region is not off-site in any meaningful sense. The off-site copy has to be in a different building, ideally in a different jurisdiction.

What you are actually backing up

The first question is what to back up, which is less obvious than it sounds. The categories worth separating in your head:

User data. The thing you cannot recreate. Database contents, uploaded files, customer records. This is the irreplaceable layer.
Configuration. The files that make the server do what it does. nginx configs, systemd units, environment variables, user account state. Recreatable in principle, but the cost of doing so manually after a loss is real.
Application code. Almost always in a git repo somewhere. If your only copy of your application code is on the production server, that is the problem to fix first.
Operating system. Reinstall from media in twenty minutes. Almost never worth backing up unless you have heavily customised it.

The right granularity is usually:

Database: native dump tool (mysqldump, pg_dump), hourly or more often.
Uploaded files and user data: incremental file-level backup (restic, borg), daily.
Configuration: git-managed where possible, file-level backup where not, daily.
OS: skip, or a yearly full-disk image if you must.

restic, borg, rsync, rclone

The four tools that cover almost every backup case for a Linux operator:

restic. The default recommendation for most users in 2026. Written in Go, single binary, deduplicating, encrypting by default. Supports a long list of backends including S3, B2, R2, Backblaze, Azure, Google Cloud, SFTP, local disks. The dedup makes incremental backups small even when files have moved. Snapshots are content-addressed, so restoring a specific point in time is fast.

# Initialise a restic repository on B2
export RESTIC_REPOSITORY="b2:my-backup-bucket:server-name"
export RESTIC_PASSWORD_FILE="/root/.restic-password"
restic init

# First backup
restic backup /etc /var/www /home/admin

# Subsequent backups are incremental
restic backup /etc /var/www /home/admin

# List snapshots
restic snapshots

# Restore a specific snapshot
restic restore latest --target /tmp/restore

borg. Older, also deduplicating, also encrypting. Locally compelling, weaker on remote-only backends. Storage format is its own; you cannot read a borg repo with any other tool. The right pick if you need maximum dedup ratio on large repositories with lots of overlap (VM images, similar tarballs across many hosts).

rsync. Not a backup tool. A copy tool. Useful as a layer underneath one (rsync to a local destination, then back that destination up properly). Using rsync as your only backup means you have no history (the second sync overwrites the first), no deduplication, and no encryption. It can do hard-link-based snapshots if you wrap it in rsnapshot, but at that point you have rebuilt half of restic by hand.

rclone. A cloud-storage transfer tool, not a backup tool in the strict sense. Useful as the rsync of object storage: copying directories between providers, replicating buckets, syncing local files to S3. The right pick when "back up" means "make sure these files are in this bucket too."

Encryption and key management

Encrypt backups before they leave the source machine. Server-side encryption from the destination provider is a useful second layer, but never the only layer. If a backup is sitting in an off-site bucket, decryptable only with a key the destination provider holds, the destination provider is your security model. That is not a good security model.

restic and borg encrypt by default. The key is derived from a passphrase you supply. The passphrase is the critical secret. Two failure modes to plan for:

The passphrase is lost. Your encrypted backups become inert. This is bad enough that the operator who lost the key is sometimes worse off than the operator who has no backups at all.
The passphrase is leaked. Anyone who has the off-site backup data and the passphrase has your data. Treat the passphrase with the seriousness of a production database password.

The pattern that works for most small operators:

Generate a long, random passphrase. Store it in a password manager.
Print a paper copy. Store it somewhere physically secure (a safe deposit box, the safe in your home office, the legal counsel's office).
Distribute the passphrase to at least two people who need it. A backup that only one person can decrypt becomes the worst kind of dependency.
Never put the passphrase in the backup tool's own config in plaintext on disk. Use a file with mode 600 owned by the user that runs the backup, and audit access to it.

Off-site copies

The off-site copy is the one that survives the bad day. The best targets, ranked by what most small operators actually do:

Object storage at a different provider from your production hosting. If production is on AWS, off-site to B2 or R2. The point is that an AWS-wide problem (billing failure, account suspension, unrelated outage) does not also lose your backups.
A second VPS at a different provider. Cheap, simple, more operational cost than object storage. The advantage is full control over the destination filesystem.
A storage box / dedicated server with large capacity at a different provider. Hetzner Storage Boxes are the canonical example. Cheap per terabyte, suitable for large rsync-style backups.
Physical media you rotate between sites. External drives at a home address, swapped weekly. Real off-site, real low-tech, real chance of forgetting. Use it as a third copy on top of online storage, not as the only off-site.

The combination that fits most small shops: local restic repo on a separate disk in the production host, daily restic backup to B2 or R2 as the off-site. Cost is in the cents-per-day range. Failure surface is minimal.

Scheduling and retention

A backup schedule has two parts: how often to take backups, and how long to keep them. The cadence side is easier than the retention side.

For the cadence, common defaults that hold up:

Databases: hourly logical dumps, plus continuous WAL/binlog archiving if the database supports it (PostgreSQL and MySQL both do).
User files: hourly snapshot of the file storage, daily off-site replication.
Configuration: daily.
Off-site sync: once a day, after the local backups are done.

For retention, the principle is keep more recent backups densely and older backups sparsely. restic implements this directly:

restic forget \
    --keep-hourly  24 \
    --keep-daily   30 \
    --keep-weekly  12 \
    --keep-monthly 24 \
    --prune

That keeps the last 24 hourly snapshots, the last 30 daily, the last 12 weekly, the last 24 monthly. After two years you still have data from any point in the recent past plus monthly milestones going back two years. The total snapshot count is around 100, the storage cost stays bounded by dedup.

Databases and other stateful applications

Databases are where file-level backups fail loudly. A copy of a running InnoDB or Postgres data directory while the database is writing produces a corrupted file: the data on disk is mid-flight, the in-memory state has not flushed. The restore is partial, the recovery is painful, you do not find out until you try.

Two options for every database:

Logical dump. mysqldump, pg_dump, mongodump. The database produces a consistent snapshot in a format that can be replayed. Slower for large databases but always correct.
Physical backup with the right tooling. Percona XtraBackup for MySQL, pg_basebackup for Postgres. These take a consistent block-level backup of a running database, plus a stream of WAL/binlog from that point. Faster for large databases, more setup.

For most small operators, logical dumps are the answer. A PostgreSQL dump pipeline:

pg_dump --format=custom --compress=9 mydb > /var/backups/db/mydb-$(date +%Y%m%d-%H%M).dump

# Restore later
createdb mydb_restore
pg_restore --no-owner --dbname=mydb_restore /var/backups/db/mydb-20260610-1800.dump

That dump file is then picked up by restic on the next backup pass, replicated off-site, encrypted, retained. The database stays a small piece of the overall pipeline.

The same principle applies to anything else stateful: Redis (BGSAVE), Elasticsearch (snapshot API), ClickHouse (BACKUP command). Each has an application-aware way to produce a consistent backup. Use that, then back up the result with restic.

The restore drill

A backup that has never been restored is a hypothesis. The restore drill is the part that converts it into a backup. The drill is simple:

Pick a backup from the last week. Pick another from a month ago.
On a separate machine (a fresh VPS, a local VM), restore the backup.
Bring up the application against the restored data.
Verify it works: log in, check a record, perform a representative read.
Note how long the whole process took.

The failure modes you catch this way are the ones that matter. The passphrase that does not unlock the off-site repo. The script that backs up the database but does not restore it because the user roles are missing. The bucket that is allegedly versioned but isn't. The provider that throttles the egress so badly that a 100 GB restore takes a week. None of these are theoretical. All of them have ruined a real operator's day.

Cadence: once a quarter is enough for most shops. Once a month is better if you can sustain it. Once after every major change to the backup system is mandatory. The change you made yesterday is the one most likely to break the restore.

A worked example: a small shop

A concrete configuration that suits a single-server SaaS with a PostgreSQL database, around 50 GB of user-uploaded files, and configuration spread across /etc and /srv.

Local snapshots. ZFS snapshots of the data filesystem every 15 minutes, hourly snapshots retained for 48 hours, daily for 30 days. zfs-auto-snapshot handles this.
Database backups. pg_dump every hour, stored under /var/backups/db/. PITR via continuous WAL archiving to a local directory.
Local restic repo. On a second physical disk (not the same one as production data), daily restic backup of /etc, /srv, /var/backups/db.
Off-site restic repo. On Backblaze B2, daily restic backup of the same set. Different passphrase from the local repo (so a single passphrase compromise does not destroy both).
Retention. 24 hourly, 30 daily, 12 weekly, 24 monthly. Local pruning runs nightly; off-site pruning runs weekly.
Restore drill. Quarterly. Restore the latest off-site snapshot to a fresh VPS, bring up Postgres and the application, verify a known record is present and a write succeeds.
Cost. Around USD 4 per month for the B2 storage. Operations cost is the once-a-quarter drill.

That setup survives single-disk failure (ZFS), entire-machine failure (local restic on second disk plus the off-site), the whole datacenter going down (off-site at B2), and credential compromise that destroys the production environment (the off-site uses different credentials).

Common failures and what they look like

A short list of the failure modes that hit real operators most often, and the symptom to recognise.

Silent backup script failure

The cron job that writes the backup exits with an error. Nothing notices. Six months later you find out the last successful backup was just after the script was last edited. The fix is monitoring: a heartbeat from the backup script to a service like healthchecks.io, paging if the heartbeat is missed for more than a day.

Disk full at the destination

The backup repo grows, pruning is forgotten, the disk fills, new backups silently fail. Symptom is gradual, the warning signs are usually there in logs that nobody reads. The fix is automated pruning paired with monitoring of the destination's free space.

Passphrase known only to one person

The person leaves the company. The off-site backups are now data nobody can read. Treat the passphrase as a shared production secret with at least two custodians.

Restore is technically possible but takes a week

The egress fee or download bandwidth from the off-site is brutal. You can get the data back, but the business is dead before you do. The fix is a destination with reasonable retrieval performance, tested by an actual restore drill, not by reading the marketing page.

Backup contains corrupted database files

Someone backed up the data directory of a running database instead of using a dump tool. The backup files look fine; the restore is unusable. The fix is to never rely on file-level backups of running stateful services. Use the application's native backup mechanism.

Frequently asked questions

How often should I back up? Read

As often as you are willing to lose data. If you can tolerate losing a day's work, daily is fine. If you cannot tolerate losing an hour, the question is no longer about backups; it is about replication and point-in-time recovery for the data that hot. Most small operators are well served by hourly incremental snapshots of critical state, daily full backups, and weekly off-site replicates.

Are filesystem snapshots a backup? Read

Local snapshots (ZFS, btrfs, LVM) are excellent for rolling back a bad config change in seconds. They are not a backup, because they live on the same disks as the data they snapshot. A failed disk takes the snapshots with it. Treat snapshots as your fast-recovery layer and backups as your durability layer.

Should I use a backup product or a script? Read

Use a tool, not a script. restic, borg, rclone and similar are mature, audited, encrypted by default. A shell script that runs tar | gzip | scp is missing every property a backup needs: incremental, deduplicated, encrypted, integrity-checked. The tool is fifteen minutes of setup and saves you ten failures over a decade.

How do I back up encrypted disks? Read

The backup is of the data, not the encrypted blocks. Mount the encrypted filesystem; the backup tool reads plaintext from inside the mount and writes encrypted backup data to its target. Block-level backups of an encrypted disk are possible but unusual; you end up with an encrypted backup of an encrypted disk, which is harder to restore from.

What about backing up to a USB drive? Read

Useful as one of the three copies, not as the off-site copy unless you physically move it. A USB drive on the shelf next to the server is on the same fire as the server. If you can rotate two drives between the office and home, that works.

Do I need to back up the OS? Read

Usually not. Modern OSes reinstall from media in twenty minutes. Back up the configuration that makes the box do its job, not the binaries. The exception is when you have heavily customised the system and rebuilding from scratch would take a working day. In that case, a full system image is worth keeping.

What is the cheapest sustainable off-site target? Read

In 2026, Backblaze B2 at around USD 0.006/GB/month. A 100 GB encrypted restic repo on B2 costs about USD 0.60 per month. Hetzner Storage Boxes are cheaper for large sustained capacity. Bare object storage is almost always cheaper than a dedicated backup SaaS for the same data.

Glossary terms used in this guide

S3 TLS encryption SSH VPS TCP UDP DNS rsync

INFRASTRUCTURE · Updated Jun 2026

Reverse proxies compared: nginx, Caddy, HAProxy and Traefik

A reference for picking and configuring the four reverse proxies that actually matter in production today.

2,197 words · 11 minute read · HostDir Editorial

RFC grounded

11 Minutes 2.2k words

NETWORKING · Updated Jun 2026

BGP: A Complete Guide to Border Gateway Protocol

How the internet's routing protocol works, why it matters, and what every network operator should know

5,370 words · 24 minute read · HostDir Editorial

RFC grounded

24 Minutes 5.4k words

INFRASTRUCTURE · Updated Jun 2026

Kubernetes for sysadmins: when it is worth it, and when it is not

An honest reference for engineers who run servers today, written from outside the Kubernetes evangelism circle. What K8s actually buys you, what it does not, and the alternatives that deserve consideration first.

2,284 words · 11 minute read · HostDir Editorial

RFC grounded

11 Minutes 2.3k words

INFRASTRUCTURE · Updated Jun 2026

Installing Hermes Agent on a Linux VPS

Self-host Nous Research's open-source AI agent on Ubuntu or Debian with persistent memory, a messaging gateway and a systemd service.

2,046 words · 10 minute read · HostDir Editorial

RFC grounded

10 Minutes 2.0k words

The 3-2-1 rule, explained without dogma

What you are actually backing up

restic, borg, rsync, rclone

Encryption and key management

Off-site copies

Scheduling and retention

Databases and other stateful applications

The restore drill

A worked example: a small shop

Common failures and what they look like

Silent backup script failure

Disk full at the destination

Passphrase known only to one person

Restore is technically possible but takes a week

Backup contains corrupted database files

Reverse proxies compared: nginx, Caddy, HAProxy and Traefik

BGP: A Complete Guide to Border Gateway Protocol

Kubernetes for sysadmins: when it is worth it, and when it is not

Installing Hermes Agent on a Linux VPS

Who Is Online