# Synchronising files

## Syncing/sharing

Pure network drives just aren't as awesome as working locally, and synchronising changes globally. Realising this is why the Dropbox founders are now rich. Well done them. Dependence on single remote servers for every trifling step is stupid.

Peer to peer is more robust. (Taking it further, how about everything be sneakernets?)

Anyway, file synchronising is handy, and tricky to do, so the solutions which do it easiest are also usually suboptimal. e.g. I have used Dropbox, but their technical and legal shortcomings are awful.

Some alternatives follow.

### Syncthing

Choose this if… You have a collection of various folders that you need shared to various different machines, and you would like many of the the different machines to be able to edit them. You don't need a server and thus you are happy for syncing to happen if and when the peers are online. And you don't care about iOs. e.g. I use this for synchronising my music production files across my studio machines, studio backup machines and gig machines.

Syncthing has an elegant decentralised sneakernet design. It is reminiscent of git-annex but doesn't have a combinatorial explosion of options, just one single sync protocol. It is actually mostly simple and friendly to use, although I spent too long reading the manual and being intimidated to dive in and discover that.

Granularity is per-folder-per-machine - each shared folder (and all sub folders) is a separate share. Like git-annex, it's doesn't support iOS. In contrast, it doesn't support archiving stuff to USB keys or semi-offline stores.

Stated design criteria:

• Private. None of your data is ever stored anywhere else than on your computers. There is no central server that might be compromised, legally or illegally.

• Encrypted. All communication is secured using TLS. The encryption used includes perfect forward secrecy to prevent any eavesdropper from ever gaining access to your data.

• Authenticated. Every node is identified by a strong cryptographic certificate. Only nodes you have explicitly allowed can connect to your cluster.

You probably want the following files ignored in your .stignore file if you don't want it to synchronise a little too aggressively.

// From Windows
$RECYCLE.BIN System Volume Information$WINDOWS.~BT
pagefile.sys
desktop.ini

// From OS X
Icon?
.Spotlight-V100
.Trashes
(?d).DS_Store
.fseventsd
(?d)._*

// From Linux
lost+found
.gvfs
.local/share/trash
.Trash-*
.fuse_hidden*


There is a cli syncthing manager for your remote cloud instances, the snazzily named syncthingmanager. It has an OSX client.

Syncthing also has file versioning and such, but cryptographic signing of versions and guaranteeing consistent snapshots and so on is not a front-and-centre feature.

Its major gotcha is that syncing between case insensitive and case sensitive file systems is broken and can delete data. That's right, this app works beautifully, smoothly and easily except that the moment you use it to sync between linux and macOS or linux and Windows (with default settings) stuff goes horribly wrong. You know, classic demonic pact read-the-small-print situation.

This has only in fact been an issue with me when syncing iTunes, which does case-changes when you import new music.

A workaround for now is vigilance, and “trash can file versioning” which allows you to recover the missing files with the following command

cd .stversions
find . -type f -exec mv \{\} ../\{\} \;


### Seafile

Choose this if… You have a collection of various folders that you need to synchronise between macOS, Windows, Linux, Android and iOs. You don't mind installing a central server to coordinate all this.

Seafile is an open-source host-blind encrypted file sync, with a premium enterprise server available for a fee. FWIW it looks simple than Owncloud to install. YMMV. The major reason I have not installed it is that I have too much data and can't affort the server farm fees.

### Retroshare

Choose this if… You want to share files, chats, data and whatever else, in a participatory social-media style network, with people you do and do not know in the network. I don't use this because I don't have a full social network of nerds.

Retroshare is not wholly focused on implicit file syncing, but does some of that and a lot more other social stuff.

Features chat, voice and video calls, offline mail, file sharing, distributed search, forums and compatibility with TOR, and sneakernet everything.

### dat

Choose this if… You have a large data set that you wish to share amongst many strangers, and if there is a single source of truth. I wuld use this for sharing finished research or journalistic data sets if I had any. but I wouldn't use it for collectively updating data across my computation computer cluster, because it doesn't suport multiple writers.

Dat is similar to syncthing, with a different emphasis - sharing data to strangers rather than friends, with a special focus on datasets. You could also use it for backups or other sharing. See scientific data sharing.

NB it's one-writer-many-readers, much like bittorrent, so don't get excited about multiple data sources, or inter-lab collaboration. For this price, though, you get data versioning and robust verifiability, which is a bargain; completely alien to my current workflow though.

Some hacks exist for partial downloading. Otherwise, you can use Dat's base layer, hyperdrive, directly from node.js. (However, no-one uses node.js for science at the moment, so if you find yourself working on this bit of plumbing, ask yourself if you are yak shaving, and whether your procrastinating might be better spent going outside for some fresh air or something.)

People have built more collaborative tools using the Dat tools, such as beaker browser, which is a decentralised web browser.

### Mega

Choose this if… You want to share files, chats, data and whatever else,with epeopl who can't install their own software and so must use a browser to download, and if you don't care that the company behind it is a little dicey. e.g. I used this for some temporary file sharing music collaboration proejcts, but now that they are over, I've deleted.

Mega Easy to run. Public source, but not open source. (Long story.) Host-blind encryption business from New Zealand.

Anyway it's relatively easy to use because it works in the browser, so it won't terrify your non-geek friends. Ok, maybe a little. Much cheaper than dropbox. The UI is occasionally freaky but it's reasonably functional, especially for its bargain-basement price. A… unique?… tradeoff of respectability, privacy and affordability.

### Rclone

Choose this if… You want to infrequently clone some files somewhere, or just because you want a swiss army knife fallback solution. e.g. I have this around just in case, but never actually use it.

Rclone is a command line program to sync files and directories to and from Google Drive, Amazon S3, Memset Memstore, Dropbox etc.

Features:

• MD5/SHA1 hashes checked at all times for file integrity
• Timestamps preserved on files
• Partial syncs supported on a whole file basis
• Copy mode to just copy new/changed files
• Sync (one way) mode to make a directory identical
• Check mode to check for file hash equality
• Can sync to and from network, eg two different cloud accounts
• Optional encryption (Crypt)
• Optional FUSE mount (rclone mount)

Cons: manual synchronising is to be avoided because every extra thing to remember is another thing you will forget at the worst possible time.

### Dropbox for the skeptical

Choose this if… you don't mind giving access to your data to dubious strangers and some of your colleagues are totally hooked on it. e.g. I use this in a roundabout way on an untrusted machine to synchronise some research data to campus luddites.

If you must use Dropbox, you can at least run it in a container, using docker so they can't spy on your stuff. Probably. At least not on the stuff you haven't explicitly put in Dropbox, which is presumably already enough stuff to keep them busy so you shouldn't feel sorry for them. This is not a painful thing to organise, taking about one hour including learning what the hell docker is from scratch. But it is flamboyantly nerdy. and still encourages unsafe Dropbox-trusting amongst your friends. At the end of it, you have made the tool so inconvenient that you may as well have been using Owncloud.

Let's say you have the default UID, GID and Dropbox location on OSX. Then you do this.

docker pull janeczku/dropbox
docker run -d --restart=always --name=dropbox \
-v ~/Dropbox:/dbox/Dropbox \
-e DBOX_UID=501 \
-e DBOX_GID=20 \
--net="host" \
janeczku/dropbox

docker logs dropbox


You might need to reboot intermittently so that Dropbox can run its self-update.

update: this turns out to be a totaly pain the the arse. I run instead dropbox and syncthing on a spare computer I have lying around.

### Keybase, not quite a file sync

An in-principle secure alternative is keybase, although it's not quite syncing, it's a kind of syncing-rebooted-thing, which facilitates secure-ish peer sharing something something.

### Owncloud

Choose this if… your campus runs a giant free data store or something based on Owncloud. e.g. that's why I have it.

Owncloud is dubiously secure; they have security advisories all the time. But even without that silliness, they don't store files encrypted, so your server host can see what you are doing. Lawks! That's only one step better than Dropbox!

OTOH, it's easy to run on your own server, e.g. using docker, so it's useful for sharing something public such as open research etc for only the cost of hosting, which is low. Additionally, Australian academics get a free 100Gb from AARNET, so we may as well.

However, there are various quirks to survive.

For one, command-line usage is not obvious.

First, you can access it as a WebDAV share, which is unwieldy but probably works. However it's also probably slow. We really want sync here.

The actual owncloud CLI documentation is hidden deeply. Tony Maro gives a walk-through. It's heavily version dependent. Beware.

### git-annex

Choose this if… you are a giant nerd with harrowing restrictions on your data transfer and its worth your while to leverage this very sophisticated and yet confusing bit fo software to work around these challenges. E.g. you are integrating sneakernets and various online options. Which I am not.

git-annex supports explicit and customisable folder-tree synchronisation, merging and sneakernets and as such I am well disposed toward it. You can choose to have things in various stores, and to copy files to an from servers or disks as they become available. It doesn't support iOs. Windows support is experimental. Granularity is per-file. It has weird symlink-based file access protocol which might be inconvenient for many uses. (I'm imagining this is trouble for Microsoft Word or whatever.)

Also, do you want to invoke various disk-online-disk-offline-how-sync-when options from the command line, or do you want stuff to magically replicate itself across some machines without requiring you to remember the correct incantation on a regular basis?

The documentation is very nerdy and not very clear, but I think my needs are nerdy and unclear by modern standards. However, the combinatorial explosion of options and excessive hands-on-ness is a serious problem which I will not realistically get around to addressing due to my to-do list already being too long.

• rsync is what I always end up using.
• aws sync.

## Bonus trick

Convert your woefully insecure sync service into a somewhat less woeful service using open source cryptomator, which encrypts all the data you send to their service rather than letting them see it, creating easy encryption drives

The drawbacks that immediately occur to me are

• this does not help with sharing files with peers, who still need to decrypt stuff somehow (although that's a problem with any encrypted service)

• you still have to run their sync software on your computer, which means trusting their client code if not their server code.

• files are encrypted individually so you are still leaking a lot of information about what kind of files they are in their size and usage patterns.

NB you could do this anyway by manually encrypting everything, but would you? No, because it's slow and tedious. You need a nice GUI like this.

## Others

• SpiderOak was the most popular encrypted service last time I checked. It is based in the USA, which, like Russia and China, is more of a secret service browsing library than a secure document store where you would keep actual private stuff, which creates certain difficulties for their credibility.

• sparkleshare is a friendly git front-end for non-specialists:

creates a special folder on your computer. You can add remotely hosted folders (or “projects”) to this folder. These projects will be automatically kept in sync with both the host and all of your peers when someone adds, removes or edits a file.

SparkleShare uses the version control system Git under the hood, so setting up a host yourself is relatively easy.

FWIW this seems to me to be less of a good sync client, and more of a good git GUI.

• Academic cred: “Ori is a distributed file system built for offline operation and empowers the user with control over synchronization operations and conflict resolution. We provide history through light weight snapshots and allow users to verify the history has not been tampered with. Through the use of replication instances can be resilient and recover damaged data from other nodes.”

• Tresorit is a Swiss Spideroak competitor, which capitalises on stronger Swiss privacy laws, (YMMV) as well as trendy encryption technology. Closed-source though, so there is still a degree of blind faith.

## Online backup

Listing encrypted backups only, because I am not crazy.

Also, I'm only listing open-source options or ones not in a jurisdiction with especially poor privacy, such as China, Russia, the UK, the USA or Australia.

## Restic

Windows, OSX, Linux.

*Choose this if… You are prepared to pay a (small) overhead in difficulty to have a trusted, encrypted backup client for backing up sensitive data to the internet cheaply.

Took me a while to decide Restic was the best option because its marketing is crap. However a helpful ycombinator post explains some upsides and points out how simple it is. It's also very easy to install and minimal, which is a change.

### duplicati

Windows, OSX, Linux.

Duplicati works with standard protocols like FTP, SSH, WebDAV as well as popular services like Microsoft OneDrive, Amazon Cloud Drive / S3, Google Drive, box.com, Mega, hubiC and many others.

Features:

• Backup files and folders with strong AES-256 encryption. Save space with incremental backups and data deduplication.
• Run backups on any machine through the web-based interface or via command line interface.
• Duplicati has a built-in scheduler and auto-updater.

Full list of backends. Looks OK but has hefty installation requirements, being built on .NET, and I got bored trying to install .NET in a sane way.

### Duplicity

OSX, linux, more bare-bones:

Duplicity backs directories by producing encrypted tar-format volumes and uploading them to a remote or local file server. Because duplicity uses librsync, the incremental archives are space efficient and only record the parts of files that have changed since the last backup. Because duplicity uses GnuPG to encrypt and/or sign these archives, they will be safe from spying and/or modification by the server.

Linux, OSX, tarsnap comes with a server for $0.25/gb/month: Tarsnap is a secure, efficient online backup service: Tarsnap runs on UNIX-like operating systems (BSD, Linux, MacOS X, Cygwin, etc) ### Others I've seen about the place zbackup, borgbackup, attic, obnam, arq. ## syncing dotfiles You might try mackup to sync settings for linux and osx machines alike to some folder somewhere. It's a database of which actual settings of various apps are actually syncable. On second thoughts, this is a fragile approach. And it freaks out if you have non-ascii characters in your filenames. Do something different. Revised recommendation: git init --bare$HOME/.dotfiles
alias dotfiles='git --git-dir=$HOME/.dotfiles/ --work-tree=$HOME'
dotfiles config --local status.showUntrackedFiles no
echo "alias dotfiles='git --git-dir=$HOME/.dotfiles/ --work-tree=$HOME'" \
>> $HOME/.bashrc  Yes, much less freaky. Actually, do you know what is even easier? Just make a git repo in your root dir. No more overthinking. Re-revised recommendation: git init$HOME
git config --local status.showUntrackedFiles no


Now! go forth and steal other peoples' dotfile tricks.