I have updated my automatic backup system. The main difference from my previous post is that the backup server is now responsible for pulling the data from remote PCs
General setup
I will show how to set up periodical incremental backups using rsync, crontab and sshfs utilities in Linux. The setup contains several Linux PCs and a backup server running Ubuntu Server. I will show how to set up scripts that run the daily backup task automatically. This system has already saved my stuff during two complete hard drive failures, allowing a full restore in a matter of a few hours. In the end of the tutorial we should achieve the following goals:
- The entire /home partition of the workstation is copied to the server every two hours
- All important directories on the server itself are copied to a dedicated backup hard drive
- Incremental backups are set up to store daily snapshots
- The amount of data to be copied over the network and stored is reduced by using rsync and hard link
There is a hard drive installed in the server for storing backups. Storing backup on a dedicated drive in a network server reduces the possibility of losing data in case of disk or file system fault. For this tutorial, let’s say that the backup drive is mounted as /backup on the server. Since writing the original post I have installed a second hard drive because of increased amount of data. The scripts and most recent snapshot from all the PCs in the network is stored on the first drive (mounted as /backup). The second drive is mounted at /backup-weekly and stores incremental daily backups.
Accessing backups from the workstations
You can find a guide to sshfs at www.linuxjournal.com. Make sure that you can mount and access the directories on the backup server. Here is a script that mounts the remote backup directories in a PC.
#!/bin/bash sshfs -o ro -o idmap=user user@server:/backup-weekly /server/backup-weekly/ sshfs -o ro -o idmap=user user@server:/backup/latest /server/backup/
Notice that the backup directories are mounted read-only on the workstation in order to prevent accidental or malicious data deletion.
Workstation setup
I have changed the method of copying files to the backup server since the original post. Previously all the PCs pushed their data using scripts run by cron under root account. This gives a bit of a headache because the scripts need to be updated on all the machines. Running them as root was not a good idea either. I find the server pull approach better except for one scenario – if a laptop is roaming, it can no longer be found on the network unless VPN is used.
In order for the server to be able to pull the files from the PCs the public SSH keys are copied from the PCs to the server. I only have a few users and PCs in my network, therefore it is relatively easy to gather all the public keys to the server.
Backup script
I have created a script to copy the home partitions of the PCs to /backup/latest/. The script is saved as /backup/scripts/pull-latest.sh in the example
#!/bin/bash log=/backup/latest/pull.log date > $log mkdir -p /backup/latest/PC-1/home mkdir -p /backup/latest/PC-2/home #repeat for all machines echo "=========================" >> $log echo "user1 PC-1" >> $log rsync --delete --exclude-from=/backup/scripts/ignore-PC1 -avz -e ssh user@1pc1:/home/user1 /backup/latest/PC-1/home &>> $log echo "=========================" >> $log echo "user2 PC-2" >> $log rsync --delete --exclude-from=/backup/scripts/ignore-PC1 -avz -e ssh user2@pc2:/home/user2 /backup/latest/PC-1/home &>> $log # repeat for all users and all machines
The script is scheduled to be run every 2 hours as root using crontab (run sudo crontab -e to add a task).
#do backup every 2 hours at 0th minute 0 */02 * * * /backup/scripts/pull-latest.sh
Daily incremental backup script
Another script (this time, in python) is responsible for making daily snapshots to /backup-weekly. It runs as a scheduled job in the night.
Basically, the script produces daily backup snapshots in /backup-weekly by first deleting the oldest snapshots until a specified amount of disc space is freed. The old snapshots are renamed so that they are named as today-2 (day before yesterday), today-3 and so on. This prepares the backup directory for a new snapshot named today-1. Then the new snapshot is done using rsync with hardlink option. Since the script runs only once a day, it makes a snapshot of whatever PC data is available at /backup/latest at that moment. I will only show the part that creates PC data snapshots. In reality, my server backup also involves source code repositories, database backup etc. which make the script complicated. These will perhaps be covered by additional posts.
import sys import logging import os import subprocess #max expected number of snapshots NUM_DAILY = 100 # minimum free space required before creating a new snapshot, in gigabytes. Adjust according to your data amounts MIN_FREE_SPACE = 40 WEEK = "/backup-weekly/" def dayDir(day): return "%(dir)stoday-%(day)02d" % {'dir': WEEK, 'day': day} def freespaceGb(p): """ Returns the number of free bytes on the drive that ``p`` is on """ s = os.statvfs(p) return s.f_bsize * s.f_bavail / (1024*1024*1024) LOG_FILE = WEEK + "backup.log" logging.basicConfig(filename=LOG_FILE, level=logging.DEBUG) try: while freespaceGb(WEEK) < MIN_FREE_SPACE and NUM_DAILY > 10: print "Low space, removing oldest backup" NUM_DAILY = NUM_DAILY - 1 cmd = "rm -rf %(dir)s" % {'dir': dayDir(NUM_DAILY)} print cmd subprocess.call(cmd, shell=True) for day in range(NUM_DAILY, 1, -1): cmd = "mv " + dayDir(day-1) + " " + dayDir(day) print cmd subprocess.call(cmd, shell=True) # create directory (if does not exist) so that rsync link-dest is happy cmd = "mkdir -p " + dayDir(2) print cmd subprocess.call(cmd, shell=True) logging.info( "rsync latest" ) cmd = "rsync --archive --one-file-system --hard-links --human-readable --inplace --numeric-ids --delete --exclude-from=/backup/exclude --link-dest=" + dayDir(2) + " /backup/latest/ " + dayDir(1) + " >> " + LOG_FILE print cmd logging.debug(cmd) subprocess.call(cmd, shell=True) #move log file cmd = "mv " + LOG_FILE + " " + dayDir(1) print cmd subprocess.call(cmd, shell=True) except Exception, e: logging.exception(e)
Cron task
# m h dom mon dow command 0 23 * * * /backup/backup
Interesting stuff. It’s a must-have thing, if you have server, running 24/7 at home. Anyway, are you sure about the last code line:
/backup/backup
wasn’t the at the end catched from html formatting?
Thanks for pointing it out. The wordpress editor does weird stuff sometimes
Very good article on rsync. Here is one from around 1999 but still the very best rsync tutorial I’ve ever read. Explains it very well IMO: http://tinyurl.com/l37guv8