Automatic daily backups using rsync

By | July 1, 2013

I have updated my automatic backup system. The main difference from my previous post is that the backup server is now responsible for pulling the data from remote PCs

General setup

I will show how to set up periodical incremental backups using rsync, crontab and sshfs utilities in Linux. The setup contains several Linux PCs and a backup server running Ubuntu Server. I will show how to set up scripts that run the daily backup task automatically. This system has already saved my stuff during two complete hard drive failures, allowing a full restore in a matter of a few hours. In the end of the tutorial we should achieve the following goals:

  • The entire /home partition of the workstation is copied to the server every two hours
  • All important directories on the server itself are copied to a dedicated backup hard drive
  • Incremental backups are set up to store daily snapshots
  • The amount of data to be copied over the network and stored is reduced by using rsync and hard link

There is a hard drive installed in the server for storing backups. Storing backup on a dedicated drive in a network server reduces the possibility of losing data in case of disk or file system fault. For this tutorial, let’s say that the backup drive is mounted as /backup on the server. Since writing the original post I have installed a second hard drive because of increased amount of data. The scripts and most recent snapshot from all the PCs in the network is stored on the first drive (mounted as /backup). The second drive is mounted at /backup-weekly and stores incremental daily backups.

Accessing backups from the workstations

You can find a guide to sshfs at www.linuxjournal.com. Make sure that you can mount and access the directories on the backup server. Here is a script that mounts the remote backup directories in a PC.

#!/bin/bash
sshfs -o ro -o idmap=user user@server:/backup-weekly /server/backup-weekly/
sshfs -o ro -o idmap=user user@server:/backup/latest /server/backup/

Notice that the backup directories are mounted read-only on the workstation in order to prevent accidental or malicious data deletion.

Workstation setup

I have changed the method of copying files to the backup server since the original post. Previously all the PCs pushed their data using scripts run by cron under root account. This gives a bit of a headache because the scripts need to be updated on all the machines. Running them as root was not a good idea either. I find the server pull approach better except for one scenario – if a laptop is roaming, it can no longer be found on the network unless VPN is used.

In order for the server to be able to pull the files from the PCs the public SSH keys are copied from the PCs to the server. I only have a few users and PCs in my network, therefore it is relatively easy to gather all the public keys to the server.

Backup script

I have created a script to copy the home partitions of the PCs to /backup/latest/. The script is saved as /backup/scripts/pull-latest.sh in the example

#!/bin/bash

log=/backup/latest/pull.log
date > $log

mkdir -p /backup/latest/PC-1/home
mkdir -p /backup/latest/PC-2/home
#repeat for all machines

echo "=========================" >> $log
echo "user1 PC-1" >> $log
rsync --delete --exclude-from=/backup/scripts/ignore-PC1 -avz -e ssh user@1pc1:/home/user1 /backup/latest/PC-1/home &>> $log
echo "=========================" >> $log
echo "user2 PC-2" >> $log
rsync --delete --exclude-from=/backup/scripts/ignore-PC1 -avz -e ssh user2@pc2:/home/user2 /backup/latest/PC-1/home &>> $log

# repeat for all users and all machines

The script is scheduled to be run every 2 hours as root using crontab (run sudo crontab -e to add a task).

#do backup every 2 hours at 0th minute
0 */02 * * * /backup/scripts/pull-latest.sh

Daily incremental backup script

Another script (this time, in python) is responsible for making daily snapshots to /backup-weekly. It runs as a scheduled job in the night.

Basically, the script produces daily backup snapshots in /backup-weekly by first deleting the oldest snapshots until a specified amount of disc space is freed. The old snapshots are renamed so that they are named as today-2 (day before yesterday), today-3 and so on. This prepares the backup directory for a new snapshot named  today-1. Then the new snapshot is done using rsync with hardlink option.  Since the script runs only once a day, it makes a snapshot of whatever PC data is available at /backup/latest at that moment. I will only show the part that creates PC data snapshots. In reality, my server backup also involves source code repositories, database backup etc. which make the script complicated. These will perhaps be covered by additional posts.

import sys
import logging
import os
import subprocess

#max expected number of snapshots
NUM_DAILY = 100
# minimum free space required before creating a new snapshot, in gigabytes. Adjust according to your data amounts
MIN_FREE_SPACE = 40
WEEK = "/backup-weekly/"

def dayDir(day):
return "%(dir)stoday-%(day)02d" % {'dir': WEEK, 'day': day}

def freespaceGb(p):
"""
Returns the number of free bytes on the drive that ``p`` is on
"""
s = os.statvfs(p)
return s.f_bsize * s.f_bavail / (1024*1024*1024)

LOG_FILE = WEEK + "backup.log"
logging.basicConfig(filename=LOG_FILE, level=logging.DEBUG)

try:
while freespaceGb(WEEK) < MIN_FREE_SPACE and NUM_DAILY > 10:
print "Low space, removing oldest backup"
NUM_DAILY = NUM_DAILY - 1
cmd = "rm -rf %(dir)s" % {'dir': dayDir(NUM_DAILY)}
print cmd
subprocess.call(cmd, shell=True)

for day in range(NUM_DAILY, 1, -1):
cmd = "mv " + dayDir(day-1) + " " + dayDir(day)
print cmd
subprocess.call(cmd, shell=True)

# create directory (if does not exist) so that rsync link-dest is happy
cmd = "mkdir -p " + dayDir(2)
print cmd
subprocess.call(cmd, shell=True)

logging.info( "rsync latest" )
cmd = "rsync --archive --one-file-system --hard-links --human-readable --inplace --numeric-ids --delete --exclude-from=/backup/exclude --link-dest=" + dayDir(2) + " /backup/latest/ " + dayDir(1) + " >> " + LOG_FILE
print cmd
logging.debug(cmd)
subprocess.call(cmd, shell=True)

#move log file
cmd = "mv " + LOG_FILE + " " + dayDir(1)
print cmd
subprocess.call(cmd, shell=True)

except Exception, e:
logging.exception(e)

Cron task

# m h  dom mon dow   command
0      23       *       *       *       /backup/backup

3 thoughts on “Automatic daily backups using rsync

  1. Ramūnas

    Interesting stuff. It’s a must-have thing, if you have server, running 24/7 at home. Anyway, are you sure about the last code line:

    /backup/backup

    wasn’t the at the end catched from html formatting?

    Reply
    1. Antanas Post author

      Thanks for pointing it out. The wordpress editor does weird stuff sometimes

      Reply

Leave a Reply

Your email address will not be published.