Time Machine for Linux.. kinda..
From the awesome blog at http://www.julioflores.com/
Time Machine for Linux.. kinda..
Added Feb 11 2009 , Modified Aug 30 2009 - 03:03 PM
This is an update from my post originally published Feb 14, 2009, it has been updated and refined a bit, hope you find it interesting.
Since I traded my venerable MacBook for a powerhorse phenom PC, and then continued to my current setup, a nice Dell inspiron 530 running Fedora 11 (don't ask :), I returned to my roots and could not live without Linux after a while.
Moving out of the OS X (at least for the role as my primary computer) platform, you soon realize that there were many things one took for granted, such as the easy of setup of wireless connections, updates to the operating system, and most importantly, a nice gem that came with the Tiger and subsequent updates: Time Machine
I was one more of the lazy drones google-ing "time machine for Linux" almost every day, many options do exist, actually, but many of them I doubt were created by actual mac users, don't get me wrong, most of them work, but I wanted one that was smart enough to not just "copy" files, but synchronize, perform incremental backups and not duplicate the same data across the backups.
Have you ever wondered in a Mac how come a 500GB external drive can contain several snapshots of your home directory, and yet still have, say, half of the available disk space free?
OS X makes use of its own hard links equivalent, a hard link is really just a reference to a file in your filesystem, for example, assume that I have a file called myfile.txt, the contents of the file amounts for 1Kb worth of data, now, let's make a copy of the file to some other folder, something like this: cp myfile.txt ./somefolder. At the end of the process you will have 2 files, eating 2Kb of disk space. Now, say that instead of copying the file, you create a hard link for the file: ln ./somefolder/myfile.txt, at the end of this process you will have 2 files, eating up only 1Kb of disk space.
Something like this will happen in my version of the "Time Machine for Linux" (Or as I'd like to call it, a poor's man Time Machine), the trick is to first create a normal (via a copy/rsync/ssh) backup to your external hard disk, and then from there, just creating your subsequent backups by updating only what has changed, anything else remains as a "hard copy". What I just described is exactly what time machine does, so be confident. This is exactly the same way Time Machine works in OS X (except for the nice GUI Time Machine provides of course).
Assumptions
• You have an external backup drive mounted (for this example I'll use /media/LinuxTimeMachine)
Begin - Consider the source code below (explanation follows after)
1. #!/bin/bash
2.
3. # Generates incremental backups of my home folder using rsync.
4. # Note that since I use an external USB drive for my backup operation
5. # ssh into another server is not required, however, adding support
6. # for this should be easy enough.
7.
8. # Cron Suggestions:
9. # If you are going to run this script, say every two hours a day (12 times per day)
10. # and want to keep a month's worth of data, then MAX_BACKUPS should be in the (12x30)
11. # range: 3600+ in my case I just want to have the last 25 backups regardless of when
12. # I run the script
13.
14. # Change these variables below for your own purposes:
15. MOUNTPOINT="/media/LinuxTimeMachine"
16. BACKUP_DIR="$MOUNTPOINT/Backups.teroknor/julio"
17. SOURCE_LOC="/home/julio"
18. MAX_BACKUPS=25
19. LOG_FILE="${SOURCE_LOC}/bin/rsync.log"
20. EXCLUDE_FILES="$SOURCE_LOC/bin/excludes.rsync"
21. RSYNC_OPTS="-aHvxog --delete --progress --log-file=$LOG_FILE --exclude-from=$EXCLUDE_FILES"
22.
23. # (Optional) - Check if my mountpoint is actually mounted:
24. mountpoint -q $MOUNTPOINT || { echo $MOUNTPOINT is invalid or not a mount point ; exit 1; }
25.
26. # Also check if the backup directory exists:
27. [ -d $BACKUP_DIR ] || { echo $BACKUP_DIR not found ; exit 1; }
28.
29. # Next is a very simple but efficient way to check if this is the first time
30. # we make a backup, it relies on a softlink done in the backup folder containing
31. # a link to the latest backup, note that even if you have a backup sysem already
32. # running and REMOVE the soft link "current"
33. # (/media/LinuxTimeMachine/Backups.teroknor/julio/latest)
34. # on my original example) the system will treat the backup as the first one and
35. # will copy the entire tree (slow) as opposed to only the changes (via hard links)
36.
37. if [ ! -L $BACKUP_DIR/latest ] ; then
38. echo "Initial Backup, this may take some time..."
39. rsync $RSYNC_OPTS $SOURCE_LOC/ $BACKUP_DIR/backup.0
40. ln -s $BACKUP_DIR/backup.0 $BACKUP_DIR/latest
41. else
42.
43. # This next segment will take care of the rotation, basically I'll have the following structure:
44. # /media/Elements/Backups.teroknor/backup.0
45. # /media/Elements/Backups.teroknor/backup.1
46. # /media/Elements/Backups.teroknor/backup.2
47. # ...
48. # ...
49. # /media/Elements/Backups.teroknor/backup.11
50. # /media/Elements/Backups.teroknor/backup.12
51. # /media/Elements/Backups.teroknor/latest (symlinked to backup.0 - Latest Backup)
52. #
53.
54. # current backup
55. cur_backup=`expr ${MAX_BACKUPS}`
56.
57. # remove oldest backup if it exists
58. if [ -d $BACKUP_DIR/backup.$cur_backup ] ; then
59. rm -fr $BACKUP_DIR/backup.$cur_backup
60. fi;
61.
62. # Move each previous backup (i.e. backup.0 to backup.1, backup.11 to backup.12
63. # all this in order to leave backup.0 ready for rsyncing the latest files..
64. for i in `seq ${cur_backup} -1 0`;
65. do
66. # previous backup
67. next_backup=`expr ${i} + 1`
68.
69. # move previous backup out of the way
70. if [ -d ${BACKUP_DIR}/backup.${i} ] ; then
71. mv $BACKUP_DIR/backup.${i} $BACKUP_DIR/backup.$next_backup
72. fi;
73. done
74.
75. rsync $RSYNC_OPTS --link-dest=$BACKUP_DIR/backup.1 $SOURCE_LOC/ $BACKUP_DIR/backup.0
76. # Remove the current "latest" symlink, since it'll change right away
77. rm -f $BACKUP_DIR/latest
78. ln -s $BACKUP_DIR/backup.0 $BACKUP_DIR/latest
79. fi;
Information
Lines 15-21 - Modify them to suit your needs, Line 20 can be excluded providing that you remove --exclude-from=$EXCLUDE_FILES from line 21.
Line 16 - Make sure you create this directory structure in your backup drive before running the program, it'll be empty originally, but it'll be the location for your backups. Feel free to change the /Backups.teroknor folder to your liking, and the last folder as well.
Lines 44-51 - This is the final structure that you will end up with in your external drive.
Conclusion
I now have a fully-functional personal backup system that is smart enough to not copy entire files, performs an incremental backup and by using rsync as the transfer program, it will only update those parts of individual files (that were modified) that actually needed to be copied. All this has been tested with CentOS 4.5 (My production server) and Fedora 11 (My desktop system), I hope you find it useful as it was for me.
The next undertaking is creating a UI for this script, I believe this can be accomplished using a web-based approach, thinking of using web2py to provide this.
Any takers?
c o m m e n t s f o r
Time Machine for Linux.. kinda..
Added 30 Aug 2009 , Modified 30 Aug 2009 - 05:05 PM By rj..@gmail.com
This style of backup is something I both need and want. Since
I am less hard core than you, I'll wait (im)patiently for someone
to create a GUI using [I hope...] Web2py, a fine framework.
Thanks for creating this handy utility!
ron k jeffries
http://identi.ca/ronkjeffries
http://blogt.eronj.com
Added 31 Aug 2009 , Modified 31 Aug 2009 - 01:31 AM By JulioF
Thanks Ron for your post,
The script above does indeed work exactly as TimeMachine does. I personally use it daily as my personal backup in fact I run it automatically as a cron job daily @ 11:00 on, a web gui does not sound like a bad idea at all, I was thinking making some sort of UI using wxwidweta or something like that. Creating an interface in web2py might just be the way to go, good thinking.

