The Living Thing / Notebooks :

Data sharing

for research and science

Tips and tricks for collaborative data sharing.

To actually use it, you want some kind of database. But first you have to get it to put it in that database.

You can download it from some open data set library, although the workflow for many of these is not great, since there data gets uploaded once and then is static. Nonetheless this basic workflow is familiar and simple Try, say, Figshare, which hosts the supporting data for many amazing papers. E.g. here’s 1.4. Gb of synapses firing.

Fancier, you can also use a sync protocol for changing data. This is also not ideal for provenance if you are not tracking who changed what, when.

Now, let’s think of some more exotic options.

Google’s open data set protocol, which they call their “Dataset Publishing Language”, is a standard for medium-size datasets with EZ-visualisations


Qu publishes any old data from a mongodb store.


One hip solution is dat, which shares and tracks updates to datasets in a distributed fashion. It attempts to be syncthing for open data with a special focus on scientific datasets.

Dat is the package manager for data. Share files with version control, back up data to servers, browse remote files on demand, and automate long-term data preservation. Secure, distributed, fast.

Rich ecosystems of distributed servers. GUI. I’m not sure what the points of friction would be for sharing of private data.