Module 8: Backup and Recovery in Linux Systems

Learning Objectives

By the end of this module, you will be able to:

1. Backup Strategies

Understanding Backup Types

Backup strategies form the foundation of any data protection plan. Each approach offers different trade-offs between storage requirements, backup speed, and restoration capabilities.

Full Backup

A full backup is exactly what it sounds like—a complete copy of all selected data. Each time you run a full backup, it copies everything, regardless of whether files have changed since the last backup.

How it works under the hood: When you initiate a full backup, the system examines each specified file and directory. It reads the data directly from disk blocks through the filesystem layer, often using system calls like read(). The backup software then writes this data to the destination, preserving file metadata including permissions, ownership, timestamps, and potentially extended attributes.

Advantages:

Disadvantages:

Incremental Backup

An incremental backup copies only files that have changed since the most recent backup of any type.

How it works under the hood: Incremental backups rely on file modification timestamps or more sophisticated change tracking mechanisms. The system examines the modification time (mtime) of each file and compares it with the timestamp of the last backup operation. Some backup systems also use file checksums to detect changes even when timestamps might be misleading. Modern implementations might use features like Linux's fsnotify subsystem to track changes in real time between backup operations.

Advantages:

Disadvantages:

Differential Backup

A differential backup copies all files that have changed since the last full backup, regardless of any intermediate backups.

How it works under the hood: Much like incremental backups, differential backups examine file modifications, but they use the last full backup as their reference point rather than the most recent backup of any type. The system maintains a timestamp of the last full backup and selects files whose mtime is newer than this reference point.

Advantages:

Disadvantages:

Backup Rotation Schemes

Beyond the types of backups, implementing an effective rotation scheme ensures you maintain appropriate backup history while managing storage requirements.

Grandfather-Father-Son (GFS)

This classic rotation scheme involves:

How it works: The system labels backup media or files according to their rotation position. When a backup cycle completes, the oldest backup in each tier gets overwritten or pruned. This creates a rolling window of backups with decreasing granularity as you move further back in time.

Tower of Hanoi

Based on the mathematical puzzle, this scheme provides an efficient compromise between storage usage and retention periods.

How it works: Backup sets are organized into levels (usually 3-5), and each level has specific rotation rules. The scheme ensures good historical coverage while minimizing the required number of backup media or storage locations.

2. Backup Tools in Linux

Linux provides several built-in and third-party tools for creating backups. We'll focus on the most common and versatile options.

rsync

rsync is a fast, versatile file synchronization and transfer tool that can efficiently create backups by copying only the differences between source and destination.

How it works under the hood: rsync uses a delta-transfer algorithm that identifies changed portions of files using rolling checksums. It divides files into blocks, calculates checksums for each block, and compares them between source and destination. Only blocks that differ are transferred, making it bandwidth-efficient and faster than regular copy operations. When run with archive mode (-a), it preserves permissions, ownership, timestamps, and other metadata.

Basic syntax:

rsync -av /source/directory/ /destination/directory/

Key options:

Example for incremental backup:

# Create a timestamped backup directory
BACKUP_DIR="/backup/$(date +%Y-%m-%d)"
mkdir -p "$BACKUP_DIR"

# Perform backup with hard links to previous backup for unchanged files
rsync -av --delete \
  --link-dest=/backup/latest \
  /source/data/ \
  "$BACKUP_DIR/"

# Update the 'latest' symlink
rm -f /backup/latest
ln -s "$BACKUP_DIR" /backup/latest

tar

tar (tape archive) is one of the oldest and most reliable backup tools in Unix/Linux. It packages multiple files and directories into a single archive file, optionally with compression.

How it works under the hood: tar reads files sequentially and writes them into a structured archive format. It preserves the directory structure, file permissions, ownership, and timestamps. Modern implementations support various compression methods through external utilities like gzip, bzip2, or xz. The tar format includes headers for each file that store metadata and pointers to the file's content within the archive.

Basic syntax:

tar -cvf backup.tar /directory/to/backup/

Key options:

Example for full backup with compression:

# Create a timestamped backup file
BACKUP_FILE="/backup/full_$(date +%Y-%m-%d).tar.gz"

# Create compressed archive
tar -czf "$BACKUP_FILE" \
  --exclude="/proc" \
  --exclude="/sys" \
  --exclude="/tmp" \
  --exclude="/var/log" \
  /

dd

dd is a low-level tool that copies data block by block, regardless of filesystem structures. It's useful for creating disk or partition images.

How it works under the hood: dd reads and writes raw data blocks directly, bypassing the filesystem layer. It can work with arbitrary block sizes and counts, making it suitable for backing up entire disks, partitions, or creating exact duplicates of storage devices. Unlike file-based tools, dd creates exact bit-for-bit copies, including unused or deleted data.

Basic syntax:

dd if=/dev/sda of=/backup/disk_image.img bs=4M status=progress

Key options:

Example for disk image backup:

# Create a compressed disk image
dd if=/dev/sda bs=4M status=progress | gzip > /backup/sda_$(date +%Y-%m-%d).img.gz

Specialized Backup Software

For more comprehensive backup solutions, several specialized tools are available:

Borg Backup

A deduplicating backup tool that supports compression and encryption.

How it works under the hood: Borg uses a content-defined chunking algorithm to split files into variable-sized chunks. These chunks are identified by their Blake2b hash and stored in a repository. Deduplication happens at the chunk level, meaning identical data chunks are stored only once, even if they appear in different files or different backup snapshots. This makes Borg extremely space-efficient for long-term backup storage.

Example:

# Initialize a repository
borg init --encryption=repokey /backup/borg-repo

# Create a backup
borg create --stats --progress \
  /backup/borg-repo::backup-$(date +%Y-%m-%d) \
  /home /etc /var/www

# List backups
borg list /backup/borg-repo

Restic

Another modern backup solution with encryption, deduplication, and support for multiple backends.

Key features:

3. Database Backup Procedures

Databases require special handling due to their concurrent operations and data structure complexity.

PostgreSQL Backup Methods

pg_dump (Logical Backup)

pg_dump creates a consistent snapshot of a database in SQL format or custom archive format.

How it works under the hood: pg_dump connects to the database as a regular client and extracts data using SQL queries. It intelligently handles dependencies between objects (tables, functions, etc.) to ensure proper restoration order. When using the custom format (-Fc), it creates a compressed archive that can be selectively restored. The tool acquires locks only briefly, allowing normal operations to continue during backup.

Example for a single database:

# Backup a single database to a custom-format archive
pg_dump -Fc -f mydb_$(date +%Y-%m-%d).dump mydb

# Backup a single database with compression to SQL format
pg_dump -c -C --column-inserts | gzip > mydb_$(date +%Y-%m-%d).sql.gz

Options explanation:

pg_basebackup (Physical Backup)

pg_basebackup creates a binary copy of the database cluster files.

How it works under the hood: Unlike pg_dump, pg_basebackup copies the actual data files that make up the PostgreSQL database cluster. It uses PostgreSQL's replication protocol to ensure a consistent copy without requiring the database to be shut down. The tool captures the Write-Ahead Log (WAL) during the backup to ensure consistency, essentially creating a point-in-time snapshot of the entire cluster.

Example:

# Create a base backup in tar format with compression
pg_basebackup -D /backup/pg_base -Ft -z -X stream -P

Options explanation:

Continuous Archiving and Point-in-Time Recovery (PITR)

For mission-critical databases, implementing continuous WAL archiving enables point-in-time recovery.

How it works under the hood: PostgreSQL maintains a write-ahead log (WAL) that records all changes to the database. By archiving these WAL segments, you can recover to any point in time by replaying the logs up to the desired moment. This setup requires configuring archive_mode and archive_command in PostgreSQL's configuration.

Configuration example:

# In postgresql.conf
archive_mode = on
archive_command = 'test ! -f /backup/pg_wal/%f && cp %p /backup/pg_wal/%f'

Recovery process overview:

  1. Restore the base backup
  2. Create a recovery configuration
  3. Start the server, which will replay WAL files to the target point

Backup Verification

Always verify your database backups to ensure they are usable.

For pg_dump backups:

# Test restoration to a temporary database
createdb test_restore
pg_restore -d test_restore mydb_backup.dump

# Verify data integrity
psql -d test_restore -c "SELECT count(*) FROM important_table;"

4. Disaster Recovery Planning

A complete disaster recovery plan goes beyond just having backups; it outlines the entire process for restoring services after a catastrophic event.

Components of a Disaster Recovery Plan

A comprehensive DR plan includes:

  1. Risk Assessment: Identify potential threats and vulnerabilities
  2. Business Impact Analysis: Determine critical systems and acceptable downtime
  3. Recovery Objectives:
    • Recovery Time Objective (RTO): Maximum acceptable downtime
    • Recovery Point Objective (RPO): Maximum acceptable data loss
  4. Backup Procedures: Detailed backup processes and schedules
  5. Recovery Procedures: Step-by-step restoration instructions
  6. Testing Schedule: Regular testing to validate the plan
  7. Roles and Responsibilities: Who does what during recovery
  8. Communication Plan: How to notify stakeholders during an incident

Documentation Requirements

Your DR documentation should include:

Testing Your DR Plan

Regular testing is crucial for ensuring your DR plan actually works when needed.

Testing approaches:

Document all test results, including issues encountered and time required for each step.

5. System Restoration Techniques

When disaster strikes, having well-documented restoration procedures is essential for rapid recovery.

Bare-Metal Recovery

Bare-metal recovery involves restoring a system from scratch on new or repaired hardware.

General process:

  1. Boot from rescue media (USB, PXE, etc.)
  2. Partition and format the disk(s)
  3. Restore the base system
  4. Restore application data
  5. Reconfigure networking and system parameters
  6. Test functionality

Example using systemrescue and tar:

# After booting from rescue media and preparing disks

# Mount the target filesystem
mount /dev/sda1 /mnt

# Restore the base system
cd /mnt
tar -xpf /backup/system_backup.tar.gz

# Restore the bootloader
chroot /mnt
grub-install /dev/sda
update-grub
exit

# Unmount and reboot
umount /mnt
reboot

Using System Imaging Tools

For more streamlined recovery, system imaging tools can simplify the process.

Clonezilla

Clonezilla is an open-source disk imaging/cloning tool.

How it works under the hood: Clonezilla uses various utilities like partclone, partimage, dd, and ntfsclone to create and restore disk images. It works at the block level but is filesystem-aware, meaning it only backs up used blocks for supported filesystems, saving both time and space. For unsupported filesystems, it falls back to sector-by-sector copying with dd.

Recovery process overview:

  1. Boot from Clonezilla Live media
  2. Select "Restore disk/partition" option
  3. Choose the source of the image (local, SSH, NFS, etc.)
  4. Select the image to restore
  5. Choose the target disk/partition
  6. Confirm and begin restoration

File-Level Restoration

For situations where only specific files need to be recovered:

Using tar:

# List the contents of the archive
tar -tvf backup.tar.gz | grep filename

# Extract specific files
tar -xvf backup.tar.gz path/to/file1 path/to/file2

Using rsync:

# Restore specific directories
rsync -av /backup/latest/var/www/ /var/www/

6. Testing Backup Integrity

Creating backups is only half the solution—you must regularly verify their integrity.

Hash Verification

Using cryptographic hashes to verify backup integrity:

# Create a hash file during backup
tar -czf backup.tar.gz /data
sha256sum backup.tar.gz > backup.tar.gz.sha256

# Verify later
sha256sum -c backup.tar.gz.sha256

Regular Restoration Testing

# Test tar archive integrity
mkdir /tmp/test-restore
tar -tzf backup.tar.gz >/dev/null || echo "Archive is corrupted!"
tar -xzf backup.tar.gz -C /tmp/test-restore

# For database backups:
createdb test_restore
pg_restore -d test_restore /backup/database.dump
psql -d test_restore -c "SELECT count(*) FROM users;"

Automated Verification

Implement automated scripts to regularly verify backup integrity:

#!/bin/bash
# Simple backup verification script

BACKUP_DIR="/backup"
LOG_FILE="/var/log/backup-verify.log"
TEST_DIR="/tmp/backup-test"

echo "Backup verification started at $(date)" >> "$LOG_FILE"

# Clean test directory
rm -rf "$TEST_DIR"
mkdir -p "$TEST_DIR"

# Get latest backup
LATEST_BACKUP=$(find "$BACKUP_DIR" -name "*.tar.gz" | sort | tail -n 1)

# Test extraction
tar -tzf "$LATEST_BACKUP" >/dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "Backup integrity check passed for $LATEST_BACKUP" >> "$LOG_FILE"

    # Test sample file extraction
    tar -xzf "$LATEST_BACKUP" -C "$TEST_DIR" etc/passwd etc/hostname
    if [ -f "$TEST_DIR/etc/passwd" ]; then
        echo "Sample file extraction successful" >> "$LOG_FILE"
    else
        echo "ERROR: Sample file extraction failed!" >> "$LOG_FILE"
    fi
else
    echo "ERROR: Backup integrity check failed for $LATEST_BACKUP" >> "$LOG_FILE"
fi

echo "Verification completed at $(date)" >> "$LOG_FILE"

7. Practical Exercises

Exercise 1: Implementing a Backup Rotation Scheme

Objective: Create a script that implements a GFS (Grandfather-Father-Son) backup rotation scheme using rsync.

Steps:

  1. Create directory structure for daily, weekly, and monthly backups
  2. Write a script that:
    • Performs a daily backup using rsync
    • Creates a weekly backup on Sundays
    • Creates a monthly backup on the 1st of each month
    • Rotates old backups out based on retention policy

Template script:

#!/bin/bash
# GFS Backup Rotation Script

# Configuration
SOURCE_DIR="/var/www"
BACKUP_BASE="/backup"
DAILY_DIR="$BACKUP_BASE/daily"
WEEKLY_DIR="$BACKUP_BASE/weekly"
MONTHLY_DIR="$BACKUP_BASE/monthly"
LOG_FILE="/var/log/backup-rotation.log"

# Create backup directories if they don't exist
mkdir -p "$DAILY_DIR" "$WEEKLY_DIR" "$MONTHLY_DIR"

# Get current date information
DAY_OF_WEEK=$(date +%u)  # 1-7, where 1 is Monday
DAY_OF_MONTH=$(date +%d)
DATE_STAMP=$(date +%Y-%m-%d)

# Log start
echo "Starting backup at $(date)" >> "$LOG_FILE"

# Perform daily backup
DAILY_BACKUP="$DAILY_DIR/backup-$DATE_STAMP"
rsync -a --delete "$SOURCE_DIR/" "$DAILY_BACKUP/"
echo "Daily backup completed to $DAILY_BACKUP" >> "$LOG_FILE"

# Weekly backup on Sundays (day 7)
if [ "$DAY_OF_WEEK" -eq 7 ]; then
    WEEK_NUM=$(date +%U)
    WEEKLY_BACKUP="$WEEKLY_DIR/backup-week-$WEEK_NUM"
    rsync -a --delete "$SOURCE_DIR/" "$WEEKLY_BACKUP/"
    echo "Weekly backup completed to $WEEKLY_BACKUP" >> "$LOG_FILE"
fi

# Monthly backup on the 1st
if [ "$DAY_OF_MONTH" -eq "01" ]; then
    MONTH=$(date +%Y-%m)
    MONTHLY_BACKUP="$MONTHLY_DIR/backup-$MONTH"
    rsync -a --delete "$SOURCE_DIR/" "$MONTHLY_BACKUP/"
    echo "Monthly backup completed to $MONTHLY_BACKUP" >> "$LOG_FILE"
fi

# Rotation: keep 7 daily, 4 weekly, 12 monthly backups
find "$DAILY_DIR" -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;
find "$WEEKLY_DIR" -maxdepth 1 -type d -mtime +28 -exec rm -rf {} \;
find "$MONTHLY_DIR" -maxdepth 1 -type d -mtime +365 -exec rm -rf {} \;

echo "Backup rotation completed at $(date)" >> "$LOG_FILE"

Your task:

  1. Implement this script on your test system
  2. Run it for several days (you can simulate different dates)
  3. Verify the rotation works correctly
  4. Add error checking and email notifications for failures

Exercise 2: Database Backup and Recovery

Objective: Create a comprehensive PostgreSQL backup solution with verification.

Steps:

  1. Install PostgreSQL if not already installed
  2. Create a test database with sample data
  3. Implement a backup script that:
    • Creates a full database dump daily
    • Implements transaction log archiving
    • Verifies backup integrity
    • Rotates old backups

Template script:

#!/bin/bash
# PostgreSQL Backup Script

# Configuration
BACKUP_DIR="/backup/postgresql"
DATE_STAMP=$(date +%Y-%m-%d)
BACKUP_FILE="$BACKUP_DIR/full_backup_$DATE_STAMP.dump"
DB_NAME="testdb"
RETENTION_DAYS=7
LOG_FILE="/var/log/pg-backup.log"

# Create backup directory if it doesn't exist
mkdir -p "$BACKUP_DIR"

# Log start
echo "Starting PostgreSQL backup at $(date)" >> "$LOG_FILE"

# Perform full backup
pg_dump -Fc -f "$BACKUP_FILE" "$DB_NAME"
if [ $? -eq 0 ]; then
    echo "Backup completed successfully: $BACKUP_FILE" >> "$LOG_FILE"
else
    echo "ERROR: Backup failed!" >> "$LOG_FILE"
    exit 1
fi

# Verify backup
echo "Verifying backup integrity..." >> "$LOG_FILE"
pg_restore -l "$BACKUP_FILE" >/dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "Backup verification passed" >> "$LOG_FILE"
else
    echo "ERROR: Backup verification failed!" >> "$LOG_FILE"
    exit 1
fi

# Rotate old backups
find "$BACKUP_DIR" -name "full_backup_*.dump" -type f -mtime +"$RETENTION_DAYS" -delete
echo "Deleted backups older than $RETENTION_DAYS days" >> "$LOG_FILE"

echo "Backup process completed at $(date)" >> "$LOG_FILE"

Your task:

  1. Implement this script on a system with PostgreSQL
  2. Create a test database and populate it with sample data
  3. Run the backup script and verify it works correctly
  4. Simulate a database corruption and perform a recovery
  5. Document the time required for each step

8. Common Pitfalls and Troubleshooting

Common Backup Failures

Insufficient Storage Space

Symptoms:

Solutions:

Monitoring example:

#!/bin/bash
# Check backup storage space

BACKUP_DIR="/backup"
THRESHOLD=90  # percentage

USAGE=$(df -h "$BACKUP_DIR" | awk 'NR==2 {print $5}' | tr -d '%')

if [ "$USAGE" -gt "$THRESHOLD" ]; then
    echo "WARNING: Backup storage at $USAGE% - running cleanup"
    # Delete oldest backups or notify admin
    find "$BACKUP_DIR" -name "*.tar.gz" -type f -mtime +30 -delete
fi

Inconsistent Database Backups

Symptoms:

Solutions:

Network Interruptions

Symptoms:

Solutions:

Retry example:

#!/bin/bash
# Backup with retry mechanism

MAX_RETRIES=3
RETRY_DELAY=60

for ((i=1; i<=MAX_RETRIES; i++)); do
    rsync -avz --timeout=60 /source/ remote:/backup/
    if [ $? -eq 0 ]; then
        echo "Backup succeeded on attempt $i"
        exit 0
    else
        echo "Backup failed, attempt $i of $MAX_RETRIES"
        sleep $RETRY_DELAY
    fi
done

echo "All retry attempts failed!"
exit 1

Recovery Challenges

Bootloader Issues

Symptoms:

Solutions:

Grub reinstallation example:

# From rescue environment
mount /dev/sda1 /mnt
mount --bind /dev /mnt/dev
mount --bind /proc /mnt/proc
mount --bind /sys /mnt/sys
chroot /mnt
grub-install /dev/sda
update-grub
exit

Permissions and Ownership Problems

Symptoms:

Solutions:

Permission restoration example:

# Fix web server permissions after restore
chown -R www-data:www-data /var/www
find /var/www -type d -exec chmod 755 {} \;
find /var/www -type f -exec chmod 644 {} \;

Incomplete Dependency Restoration

Symptoms:

Solutions:

9. Quick Reference Summary

Backup Strategy Selection

Strategy When to Use Storage Requirements Recovery Time
Full When simplicity and fast restoration are priorities High Fast
Incremental When storage space is limited and backup window is short Low Slow
Differential Balanced approach for most systems Medium Medium

Essential Commands

rsync backup:

rsync -avz --delete /source/ /backup/

tar backup:

tar -czf backup.tar.gz /directory/to/backup/

PostgreSQL backup:

pg_dump -Fc -f database.dump dbname

PostgreSQL restore:

pg_restore -d dbname database.dump

Verify tar archive:

tar -tzf backup.tar.gz >/dev/null || echo "Corrupted!"

Extract specific files from tar:

tar -xzf backup.tar.gz path/to/file

Backup Checklist

  1. Define requirements:
    • Recovery time objective (RTO)
    • Recovery point objective (RPO)
    • Retention requirements
  2. Select appropriate strategy:
    • Full, incremental, or differential
    • Appropriate rotation scheme
  3. Implement automation:
    • Scheduled backup jobs
    • Monitoring and alerting
    • Rotation and cleanup
  4. Test regularly:
    • Backup integrity verification
    • Restoration testing
    • Disaster recovery drills
  5. Document everything:
    • Backup procedures
    • Restoration steps
    • Contact information

By following these guidelines and implementing the concepts covered in this module, you'll have a robust backup and recovery system that can weather virtually any disaster scenario. Remember, the key to successful disaster recovery is preparation and testing—a backup you can't restore is no backup at all.