GIT as a backup tool

Question

On a server, install git

cd /
git init
git add .
git commit -a -m "Yes, this is server"

Then get /.git/ to point to a network drive (SAN, NFS, Samba whatever) or different disk. Use a cron job every hour/day etc. to update the changes. The .git directory would contain a versioned copy of all the server files (excluding the useless/complicated ones like /proc, /dev etc.)

For a non-important development server where I don't want the hassle/cost of setting it up on a proper backup system, and where backups would only be for convenience (I.E. we don't need to backup this server but it would save some time if things went wrong), could this be a valid backup solution or will it just fall over in a big pile of poop?

@B14D3 I think sparkleshare is more of a sort of dropbox type thingy, but I'll look into it — Smudge, Dec 15 '11 at 14:20
you're right, but it using git to make some sort of buckup thing (copying to several pc's and controling versions of files);) — B14D3, Dec 15 '11 at 14:28
The big problem with this is that there is no central control - you need to have direct (ssh) access to the machine to preform any form of maintenance or backup validation. I always find installing an app on the boxes to be backed up then administering them from a central location is a much bigger win. — hafichuk, Dec 15 '11 at 16:03
@hafichuk With tools like Puppet/Chef it's not such a big issue, but I see your point. — Smudge, Dec 16 '11 at 10:45
I've been doing this for a while but for a different reason. I like to use the diff tool anytime I want to install something, but don't know what it's going to do to my system. I usually install a v m with the same OS and with get installed at the root, and install software to that. After installation, I run git diff to see any changes made anywhere in the OS. — Nate T, Jul 30 '21 at 05:43

score 107 · Answer 1 · edited Nov 02 '15 at 04:18

107

You're not a silly person. Using git as a backup mechanism can be attractive, and despite what other folks have said, git works just fine with binary files. Read this page from the Git Book for more information on this topic. Basically, since git is not using a delta storage mechanism, it doesn't really care what your files look like (but the utility of git diff is pretty low for binary files with a stock configuration).

The biggest issue with using git for backup is that it does not preserve most filesystem metadata. Specifically, git does not record:

file groups
file owners
file permissions (other than "is this executable")
extended attributes

You can solve this by writing tools to record this information explicitly into your repository, but it can be tricky to get this right.

A Google search for git backup metadata yields a number of results that appear to be worth reading (including some tools that already attempt to compensate for the issues I've raised here).

etckeeper was developed for backing up /etc and solves many of these problems.

edited Nov 02 '15 at 04:18

Kalle Richter

259
6
17

answered Dec 15 '11 at 17:25

larsks

41,276
13
117
170

21

+1 for mentioning ACLs/permissions – Larry Silverman Dec 20 '11 at 17:59
27

Git also doesn't store empty directories. – Flimm Nov 22 '12 at 11:05
and it also sucks for tracking file moving / renaming, through history. – cregox May 10 '13 at 23:19
The link to git's internals, is http://git-scm.com/book/en/Git-Internals-Git-Objects – Nikos Alexandris Aug 14 '14 at 08:09
1

Since git doesn't deal with binary files very well, you might also want to look into [git annex](http://git-annex.branchable.com/), which helps do that better. It does change the idea of what git is somewhat, however. – Wouter Verhelst Jun 19 '15 at 08:33
I have tried using git as backup solution at certain occasion, but gave up one week later as the repository became corrupt. One thing git can't handle is unstable network. Once the repository is corrupt, it needs troublesome repository surgery to fix, and only if the fault is discovered soon enough. – Abel Cheung Jul 09 '15 at 00:30
1

my opinion is that you can use git to backup data but not entire servers – EKanadily Feb 16 '17 at 11:43

score 28 · Answer 2 · edited May 03 '20 at 09:27

28

I've not used it, but you might look at bup which is a backup tool based on git.

edited May 03 '20 at 09:27

Helge Klein

2,031
1
15
22

answered Dec 15 '11 at 13:27

stew

9,263
1
28
43

Never seen bup before, looks interesting – Smudge Dec 15 '11 at 13:59
2

I've started using bup recently, just a few days before my hard drive crashed ;) Restore went fine, so recommended! – André Paramés Dec 16 '11 at 14:30
1

@AndréParamés so what you're saying is just after you installed bup your hard drive crashed... mmmmhh... :) just kidding – hofnarwillie Jun 11 '16 at 23:24
The above mentioned _fork_ of BUP project has gone feet UP. [Here](https://github.com/bup/bup) is the official repo which seems to still be under active development. – gillytech Apr 17 '20 at 00:08

score 12 · Answer 3 · edited Jul 08 '15 at 19:17

12

It can be a valid backup solution, etckeeper is based on this idea. But keep an eye on the .git directory permissions otherwise pushing /etc/shadow can be readable in the .git directory.

edited Jul 08 '15 at 19:17

chicks

3,639
10
26
36

answered Dec 15 '11 at 12:18

Stone

6,941
1
19
33

score 12 · Answer 4 · answered Dec 15 '11 at 13:45

12

Whilst technically you could do this I would put two caveats against it:

1, You are using a source version control system for binary data. You are therefore using it for something that it was not designed for.

2, I worry about your development process if you don't have a process (documentation or automated) for building a new machine. What if you got hit buy a bus, who would know what to do and what was important?

Disaster recovery is important, however its better to automate (script) the setup of a new development box than just backup everything. Sure use git for your script/documentation but not for every file on a computer.

answered Dec 15 '11 at 13:45

Phil Hannent

675
2
10
21

5

Development boxes all come from KickStart files, and actually the average box lasts for about 2 or 3 months before it's re-built. But people change configs and do things, we re-build the boxes and people say "hey, I know I didn't put it in source control but I had some shit on that box" and I laugh at them for being stupid. All around, good times. Binary data would be a bitch, it's something I totally overlooked while in the shower. – Smudge Dec 15 '11 at 13:56
I applaud your attitude to those that fail to follow basic principals. Personally I have a similar situation to you, however I have a git repository which links in all the config files that might be important rather than a catch all. Plus a txt doc with setup steps. – Phil Hannent Dec 15 '11 at 14:01
2

I think git works pretty well for binary files, vide Google Android's bulk part of the repo are git repositories of prebuilt executables. – user377178 Dec 22 '12 at 22:11

user64141 · Answer 5 · 2015-04-26T14:58:52.463

I use git as a backup for my Windows system, and it's been incredibly useful. At the bottom of the post, I show the scripts I use to configure on a Windows system. Using git as a backup for any system provides 2 big advantages:

Unlike commercial solutions often use their own proprietary format, your backup is in an open source format that is widely supported and very well documented. This gives you full control of your data. It's very easy to see which files changed and when. If you want to truncate your history, you can do that as well. Want to obliterate something from your history? No problem. Getting a version of your file back is as simple as any git command.
As many or as few mirrors as you want, and all can have customized backup times. You'll get your local mirror, which is unburdened by slow Internet traffic, and thus gives you (1) the ability to do more frequent backups throughout the day and (2) a quick restoration time. (Frequent backups are a huge plus, because I find the most time I lose a document is by user-error. For example, your kid accidentally overwrites a document he's been working on for the last 5 hours.) But you'll get your remote mirror, which gives the advantage of data protection in case of a local disaster or theft. And suppose you want your remote mirror backing up at customized time to save your Internet bandwidth? No problem.

Bottom line: A git backup gives you incredible amounts of power on controlling how your backups happen.

I configured this on my Windows system. The first step is to create the local git repo where you will commit all your local data to. I recommend using a local second hard drive, but using the same harddrive will work to (but it's expected you'll push this somewhere remote, or otherwise your screwed if the harddrive dies.)

You'll first need to install cygwin (with rsync), and also install git for Windows: http://git-scm.com/download/win

Next, create your local git repo (only run once):

init-repo.bat:

@echo off
REM SCRIPT PURPOSE: CREATE YOUR LOCAL GIT-REPO (RUN ONLY ONCE)

REM Set where the git repository will be stored
SET GBKUP_LOCAL_MIRROR_HOME=E:\backup\mirror


REM Create the backup git repo. 
SET GIT_PARAMS=--git-dir=%GBKUP_LOCAL_MIRROR_HOME%\.git --work-tree=%GBKUP_LOCAL_MIRROR_HOME% 
mkdir %GBKUP_LOCAL_MIRROR_HOME%
git %GIT_PARAMS% init
git %GIT_PARAMS% config core.autocrlf false
git %GIT_PARAMS% config core.ignorecase false 
git %GIT_PARAMS% config core.fileMode false
git %GIT_PARAMS% config user.email backup@yourComputerName
git %GIT_PARAMS% config user.name backup

REM add a remote to the git repo.  Make sure you have set myRemoteServer in ~/.ssh/config   
REM The path on the remote server will vary.  Our remote server is a Windows machine running cygwin+ssh.  
REM For better security, you could install gitolite on the remote server, and forbid any non-fast-forward merges, and thus stop a malicious user from overwriting your backups.
git %GIT_PARAMS% remote add origin myRemoteServer:/cygdrive/c/backup/yourComputerName.git

REM treat all files as binary; so you don't have to worry about autocrlf changing your line endings
SET ATTRIBUTES_FILE=%GBKUP_LOCAL_MIRROR_HOME%\.git\info\attributes
echo.>> %ATTRIBUTES_FILE% 
echo *.gbkuptest text>> %ATTRIBUTES_FILE% 
echo * binary>> %ATTRIBUTES_FILE% 
REM compression is often a waste of time with binary files
echo * -delta>> %ATTRIBUTES_FILE% 
REM You may need to get rid of windows new lines. We use cygwin's tool
C:\cygwin64\bin\dos2unix %ATTRIBUTES_FILE%

Next, we have our backup script wrapper, which will be called regularly by Windows Scheduler:

gbackup.vbs:

' A simple vbs wrapper to run your bat file in the background
Set oShell = CreateObject ("Wscript.Shell") 
Dim strArgs
strArgs = "cmd /c C:\opt\gbackup\gbackup.bat"
oShell.Run strArgs, 0, false

Next, we have the backup script itself that the wrapper calls:

gbackup.bat:

    @echo off

REM Set where the git repository will be stored
SET GBKUP_LOCAL_MIRROR_HOME=E:\backup\mirror
REM the user which runs the scheduler
SET GBKUP_RUN_AS_USER=yourWindowsUserName
REM exclude file
SET GBKUP_EXCLUDE_FILE=/cygdrive/c/opt/gbackup/exclude-from.txt

SET GBKUP_TMP_GIT_DIR_NAME=git-renamed
for /f "delims=" %%i in ('C:\cygwin64\bin\cygpath %GBKUP_LOCAL_MIRROR_HOME%') do set GBKUP_LOCAL_MIRROR_CYGWIN=%%i

REM rename any .git directories as they were (see below command)
for /r %GBKUP_LOCAL_MIRROR_HOME% %%i in (%GBKUP_TMP_GIT_DIR_NAME%) do ren "%%i" ".git" 2> nul

SET RSYNC_CMD_BASE=C:\cygwin64\bin\rsync -ahv --progress --delete --exclude-from %GBKUP_EXCLUDE_FILE%

REM rsync all needed directories to local mirror
%RSYNC_CMD_BASE% /cygdrive/c/dev %GBKUP_LOCAL_MIRROR_CYGWIN%
%RSYNC_CMD_BASE% /cygdrive/c/Users/asmith %GBKUP_LOCAL_MIRROR_CYGWIN%
%RSYNC_CMD_BASE% /cygdrive/c/Users/bsmith %GBKUP_LOCAL_MIRROR_CYGWIN%

cacls %GBKUP_LOCAL_MIRROR_HOME% /t /e /p  %GBKUP_RUN_AS_USER%:f

REM rename any .git directories as git will ignore the entire directory, except the main one
for /r %GBKUP_LOCAL_MIRROR_HOME% %%i in (.git) do ren "%%i" "%GBKUP_TMP_GIT_DIR_NAME%" 2> nul
ren %GBKUP_LOCAL_MIRROR_HOME%\%GBKUP_TMP_GIT_DIR_NAME% .git

REM finally commit to git
SET GIT_PARAMS=--git-dir=%GBKUP_LOCAL_MIRROR_HOME%\.git --work-tree=%GBKUP_LOCAL_MIRROR_HOME% 
SET BKUP_LOG_FILE=%TMP%\git-backup.log
SET TO_LOG=1^>^> %BKUP_LOG_FILE% 2^>^&1
echo ===========================BACKUP START=========================== %TO_LOG%
For /f "tokens=2-4 delims=/ " %%a in ('date /t') do (set mydate=%%c-%%a-%%b)
For /f "tokens=1-2 delims=/:" %%a in ('time /t') do (set mytime=%%a%%b)
echo %mydate%_%mytime% %TO_LOG%
echo updating git index, committing, and then pushing to remote %TO_LOG%
REM Caution: The --ignore-errors directive tells git to continue even if it can't access a file.
git %GIT_PARAMS% add -Av --ignore-errors %TO_LOG%
git %GIT_PARAMS% commit -m "backup" %TO_LOG%
git %GIT_PARAMS% push -vv --progress origin master %TO_LOG%
echo ===========================BACKUP END=========================== %TO_LOG%

We have exclude-from.txt file, where we put all the files to ignore:

exclude-from.txt:

target/
logs/
AppData/
Downloads/
trash/
temp/
.idea/
.m2/
.IntelliJIdea14/
OLD/
Searches/
Videos/
NTUSER.DAT*
ntuser.dat*

You'll need to go to any remote repos and do a 'git init --bare' on them. You can test the script by executing the backup script. Assuming everything works, go to Windows Scheduler and point an hourly backup toward the vbs file. After that, you'll have a git history of your computer for every hour. It's extremely convenient -- every accidentally delete a section of text and miss it? Just check your git repository.

Just curious - will it work also for slow or non-standard network drives, like the ones emulated by NetDrive or Expandrive? I find most backup software failing with these network drives. Also things get painfully slow and tend to time-out, if I want to list all the files in the backup and extract individual files. Is git able to solve these issues? — JustAMartin, Aug 16 '15 at 13:35
@JustAMartin I've never tested it on network drives, so I can't say. Once you get the files IN a git repo, git is very efficient. — user64141, Aug 16 '15 at 16:33

score 5 · Answer 6 · answered Dec 15 '11 at 13:40

5

Well it's not a bad idea, but I think there is 2 red flags to be raised:

If the harddisk fail, you'll lose everything if you're not pushing your commit to another server/drive. ( Event if you've a plan for it, I prefer to mention. )

... but still, it can be a good backup for corruptions-related things. Or like you said, if the .git/ folder is somewhere else.

This backup will always increase in size. There's no pruning or rotation or anything by default.

... So you may need to tell your cronjob to add tags, and then make sure commit that are not tagged will be cleaned up.

answered Dec 15 '11 at 13:40

FMaz008

429
3
12

We would probably mount the .git directory on a remote server, although the clasic `rm -Rf /` would cause us some issues. Our current backup system keeps stuff for 2 years or 50 versions (whichever comes last) so our backup is constantly increasing anyway. But I like the idea of adding tags, we could have "daily", "weekly" etc. tags – Smudge Dec 15 '11 at 13:58
+1 for ever growing space requirements – hafichuk Dec 15 '11 at 16:00
@sam git is ever growing. You canot prune the history older than N years. I suppose your current system does. – rds Dec 16 '11 at 10:44
1

Regarding increase in size, please do 'git gc' regularly or before you push to another (central) server. Without this the git repo may grow (much) larger than it should. I once had a 346 MB git repo that can shrink down to 16 MB. – Hendy Irawan Feb 13 '12 at 14:27

score 3 · Answer 7 · answered Dec 15 '11 at 13:23

3

I haven't tried it with a full system but I'm using it for my MySQL backups (with the --skip-extended-insert option) and it has really worked well for me.

You're going to run into problem with binary data files (their entire contents could and will change) and you might have problems with the .git folder getting really large. I would recommend setting up a .gitignore file and only backing up text files that you really know you need.

answered Dec 15 '11 at 13:23

Scott Keck-Warren

1,670
1
14
23

I'm using it for MySQL backups too, with --extended-insert=false. Be sure to "git gc" regularly or right after commit. – Hendy Irawan Feb 13 '12 at 14:32
See [Is backing up a MySQL database in Git a good idea?](http://programmers.stackexchange.com/q/241109/41651) – Michael Hampton Apr 26 '15 at 15:02

score 3 · Answer 8 · answered Mar 21 '15 at 20:01

I once developped a backup solution based on subversion. While it worked quite well (and git should work even better), I think there are better solutions out here.

I consider rsnapshot to be one of the better - if not the better. With a good use of hard link, I have a 300 GB fileserver (with half a million files) with daily, weekly and montly backup going back as far as one years. Total used disk space is only one full copy + the incremental part of each backup, but thanks to hardlinks I have a complete "live" directory structure in each of the backups. In other word, files are directly accessible not only under daily.0 (the most recent backup), but even in daily.1 (yestarday) or weekly.2 (two week ago), and so on.

Resharing the backup folder with Samba, my users are able to pull the file from backups simply by pointing their PC to the backup server.

Another very good options is rdiff-backup, but as I like to have files always accessible simply by heading Explorer to \\servername, rsnapshot was a better solution for me.

Last release of rdiff-backup is from 2009. Is it extremely well designed and requiring no updated at all or is it simply an abandoned project? — reducing activity, Apr 16 '18 at 21:49
I don't know if it is maitained, but It is basically "done". — shodanshok, Apr 17 '18 at 06:18
From looking at http://savannah.nongnu.org/bugs/index.php?go_report=Apply&group=rdiff-backup&func=browse&set=custom&msort=0&report_id=100&advsrch=0&status_id=0&resolution_id=0&assigned_to=0&category_id=0&bug_group_id=0&history_search=0&history_field=0&history_event=modified&history_date_dayfd=17&history_date_monthfd=4&history_date_yearfd=2018&chunksz=50&spamscore=5&boxoptionwanted=1#options it seems that there was some activity as late as 2015 but many bug reports are ignored. I think I will classify it as an abandoned. — reducing activity, Apr 17 '18 at 06:42

score 2 · Answer 9 · answered Dec 15 '11 at 18:07

I had the same idea to backup with git, basically because it allows versioned backups. Then I saw rdiff-backup, which provides that functionality (and much more). It has a really nice user interface (look at the CLI options). I'm quite happy with that. The --remove-older-than 2W is pretty cool. It allows you to just delete versions older than 2 weeks. rdiff-backup stores only diffs of files.

score 2 · Answer 10 · answered Mar 06 '13 at 13:22

I am extremely new to git, but aren't branches local by default, and must be pushed explicitly to remote repositories? This was an unpleasant and unexpected surprise. After all, don't I want all of my local repo to be 'backed up' to the server? Reading the git book:

Your local branches aren’t automatically synchronized to the remotes you write to — you have to explicitly push the branches you want to share. That way, you can use private branches for work you don’t want to share, and push up only the topic branches you want to collaborate on.

To me this meant that those local branches, like other non-git files on my local machine, are at risk of being lost unless backed up regularly by some non-git means. I do this anyway, but it broke my assumptions about git 'backing up everything' in my repo. I'd love clarification on this!

Pretty much everything about git with the exception of remotes is local. That is by design. You can push things to remotes, and should, particularly if used for backup as in this scenario. For branches, again, yes, you need to explicitly push them if you want them added to a remote. For development, this is great because often you want to test something out, but there is no need for that test branch to be preserved indefinitely. Once you have what you need from it, you're likely going to merge it to a dev branch and del the test branch. — LocalPCGuy, Jul 13 '14 at 17:04

score 1 · Answer 11 · answered Dec 15 '11 at 14:47

I found this to be a good methodology for my dev boxes. It changes them from being something that needs to be backed up to only a deployment endpoint.

All the configuration and package installation manifests are stored in Puppet, allowing for easy redeployment and configuration updates. The Puppet directory is backed up with git. Kickstart is used to do the initial deploy.

I also keep a custom YUM repository for whatever packages are being developed at the time. This has the added benefit that whatever packages we are working with aren't just left as unattended binaries on the local system - if that happens and the files get nuked oh well. Someone didn't follow proper procedure.

score 1 · Answer 12 · answered Jan 21 '13 at 06:44

1

You might want to check out bup on github which was designed to serve the purpose of using git for backup.

answered Jan 21 '13 at 06:44

mcantsin

130
7

2

previous answer already points to that same tool (bup). http://serverfault.com/a/341213/303467 . Any highlights on it? – Javier Aug 30 '15 at 15:33

rfmoz · Answer 13 · 2015-08-14T07:54:21.797

It is a approach that is used, it makes sense.

Keepconf use rsync and git for this job, it's a wrapper over this tools for keep the thing easy.

You only need a central server with ssh-keys configured for access to the backup servers and a few lines in the configuration file. For example, this is my own file for keep all /etc/ and the debian packages installed:

[hosts]
192.168.1.10
192.168.1.11
192.168.1.12

[files]
/etc/*
/var/lib/dpkg/status

With that, I have the rsync backup and the git commit.

ArMD · Answer 14 · 2020-08-26T02:18:27.070

0

Wrote about a simple way to do this: backup-org-files-in-github

This works for files that are not collaborated upon, in my case - emacs org files. I used cron to periodically do a git commit, git push.

edited Aug 26 '20 at 02:18

answered Aug 24 '20 at 02:53

ArMD

101
2

score 0 · Answer 15 · answered Dec 15 '11 at 13:39

My personal opinion is that this is basically all backwards. You're pushing the files into a backup solution, rather than pulling them out.

Much better would be to centralise the configuration of the server in the first place, and then pull it down, using something like puppet.

That said, it may work, i just dont think it'd be that good.

Try looking into backuppc - its pretty easy to set up and is frankly brilliant.

score 0 · Answer 16 · answered Sep 06 '16 at 16:30

It would work somewhat, but two caveats.

File additions will not be picked up automatically when you do the commit. Use --porcelean om git status to find new stuff to add before doing the commit.
Why the hassle of a remote mount for the .ssh? It ciuld be fragile Bd you won't know that it failed. Use a bare repository for the far end with a normal ssh key login. As long as the repository is bare and you only push from one source it is guaranteed to work wirhout a merge.

GIT as a backup tool

16 Answers16

Linked