JMM’s notes on

git-annex

git-annex helps manage large amounts of data in using git. It helps me back up data, transfer data, and just know where things are located. I use it to store my photos and any large binary files in any code repos. It’s probably one of the most useful pieces of software I use.

Command line

List files that are present

$ git annex list --in=here

here
|myremote
||remote2
|||web
||||bittorrent
|||||
X____ example1/file1.png
XX___ example1/file1.webp
X____ example2/movie1.mp4
X____ example2/blah.jpg

Initialize an rsync remote

This makes an rsync remote called myrsyncremote that lives on a host called myremotehost at the path /data/someplace/repo. It also makes it encrypted, so the remote host can’t read the data (though anyone that has access to the git repository itself can).

git annex initremote myrsyncremote type=rsync rsyncurl=myremotehost:/data/someplace/repo encryption=shared

Syncing with less bandwidth

Sometimes you don’t want to saturate your link when you’re doing other stuff. Here’s how you’d upload with a limited rate:

git -c annex.bwlimit=100KiB annex copy --to=myremote .

Or

git -c annex.bwlimit=100KiB annex copy --to=myremote --not --in=myremote .

Listing files that need to be backed up

I should probably just use the “wants” feature, but here’s how I do it for now.

git annex list --include='*.jpg' --or --include='*.JPG' --in=here --not --in=myremote

And here I put *.JPG first because these are DSLR photos that I probably need to back up first.

git annex copy --to=myremote --include='*.JPG' --or --include='*.jpg' --in=here --not --in=myremote