git-annex helps manage large amounts of data in using git. It helps me back up data, transfer data, and just know where things are located. I use it to store my photos and any large binary files in any code repos. It’s probably one of the most useful pieces of software I use.
Links
- Homepage
- https://git-annex.branchable.com/
Command line
List files that are present
$ git annex list --in=here here |myremote ||remote2 |||web ||||bittorrent ||||| X____ example1/file1.png XX___ example1/file1.webp X____ example2/movie1.mp4 X____ example2/blah.jpg
Initialize an rsync remote
This makes an rsync remote called myrsyncremote
that lives on a host called myremotehost
at the path /data/someplace/repo
.
It also makes it encrypted, so the remote host can’t read the data (though anyone that has access to the git repository itself can).
git annex initremote myrsyncremote type=rsync rsyncurl=myremotehost:/data/someplace/repo encryption=shared
Syncing with less bandwidth
Sometimes you don’t want to saturate your link when you’re doing other stuff. Here’s how you’d upload with a limited rate:
git -c annex.bwlimit=100KiB annex copy --to=myremote .
Or
git -c annex.bwlimit=100KiB annex copy --to=myremote --not --in=myremote .
Listing files that need to be backed up
I should probably just use the “wants” feature, but here’s how I do it for now.
git annex list --include='*.jpg' --or --include='*.JPG' --in=here --not --in=myremote
And here I put *.JPG
first because these are DSLR photos that I probably need to back up first.
git annex copy --to=myremote --include='*.JPG' --or --include='*.jpg' --in=here --not --in=myremote