Tips and tricks for collaborative data sharing.
To actually use it, you want some kind of database. But first you have to get it to put it in that database.
You can download it from some open data set library, although the workflow for many of these is not if you have ongoing research, since the data gets uploaded once and thereafter is static. Nonetheless this basic workflow is familiar and simple. Try, say, Figshare, which hosts the supporting data for many amazing papers. E.g. here’s 1.4. Gb of synapses firing.
More modern, you can also use a sync system for changing data. This is not necessarily ideal for provenance if you are not tracking who changed what, when.
Now, let’s think of some more exotic options.
One hip solution is dat, which shares and tracks updates to datasets in a distributed fashion. It is very similar to syncthing, with a slightly different emphasis - sharing discoverable data to strangers rather than to friends, with a focus on datasets. You could also use it for backups or other sharing.
Dat is the package manager for data. Share files with version control, back up data to servers, browse remote files on demand, and automate long-term data preservation. Secure, distributed, fast.
Rich ecosystems of distributed servers. GUI.
Qu publishes any old data from a mongodb store.
Google’s open data set protocol, which they call their “Dataset Publishing Language”, is a standard for medium-size datasets with EZ visualisations
- Open Science Framework seems to be github, but with a focus on preserving dataset assets well, instead of focussing on code change sets.
- rOpensci is, AFAICT, a way of seamlessly importing disparate online data sets into your analysis
- Dan Hopkins and Brendan Nyhan on How to make scientific research more trustworthy