Digital scientific workbooks

Literate coding for reality

November 18, 2014 — March 31, 2022

academe
computers are awful
faster pussycat
how do science
information provenance
plain text
premature optimization
UI
workflow

The exploratory-algorithm-person’s IDE-equivalent. Literate coding-meets-science. a.k.a. dynamic report generation, a.k.a. literate programming.

Let’s say I want to demonstrate my algorithm to my thesis advisor while he’s off at conference. I need an easily shareable demonstration. that’s why we have the internet, right? I should be able to interleave text and mathematics and also code demonstrating the thingy, maybe even some graphs of the output. It should be in a simple format that one can use/execute/edit as simply as possible. That is what a scientific workbook does; It takes a the text and renders all the graphs and tables and other experimental output as a nicely formatted document. Reproducing the document requires pressing a single button, not laboriously manually executing some inscrutable code snippets, or a bloody spreadsheet.

Everyone wants to make this better, but there are coordination problems and standards problems and inertia.

See the Rethinking ML Papers for some recent advances.

1 Philosophy

Why do this? To belatedly immanentize the prophecy that the scientific paper is dead. As part of the reproducible/open science process.

Yihui Xie puts it practically: Notebook war summarises some philosophical and practical differences between the literate coding/exploratory notebook hybrid tools in use with an eye to application. Workbook systems are also behind projects like Nextjournal, a collaborative coding machine that claims to make this easy for you and your colleagues to write in a workbook style together. Go and read Explorable explanations for some philosophy, or explorabl.es for some hands on experiments. A welcoming model one is Nicky Case’s loopy.

Welcome to Explorable Explanations, a hub for learning through play! We’re a disorganized “movement” of artists, coders & educators who want to reunite play and learning.

In fact, most of us just run code most of the time; pragmatic tool sexist to turn that into a real workflow, which I call experiment tracking.

Jeremy Kun on UI for mathematics discusses one of the problems that scientific workbooks are implicitly attempting to solve.

Lots of people struggle with math, and a better user interface for mathematics would immediately usher in a new age of enlightenment. This isn’t an idle speculation. It has happened time and time again throughout history. The Persian mathematician Muhammad ibn Musa al-Khwarizmi invented algebra (though without the symbols for it) which revolutionized mathematics, elevating it above arithmetic and classical geometry, quickly scaling the globe. Make no mistake, the invention of algebra literally enabled average people to do contemporarily advanced mathematics.… Shortly after the printing press was invented French mathematicians invented modern symbolic notation for algebra, allowing mathematics to scale up in complexity. Symbolic algebra was a new user interface that birthed countless new thoughts. Without this, for example, mathematicians would never have discovered the connections between algebra and geometry that are so prevalent in modern mathematics and which lay the foundation of modern physics. Later came the invention of set theory, and shortly after category theory, which were each new and improved user interfaces that allowed mathematicians to express deeper, more unified, and more nuanced ideas than was previously possible.…

In his book “The Art of Doing Science and Engineering,” the mathematician and computer scientist Richard Hamming put this difficulty into words quite nicely,

It has rarely proved practical to produce exactly the same product by machines as we produced by hand. Indeed, one of the major items in the conversion from hand to machine production is the imaginative redesign of an equivalent product. Thus in thinking of mechanizing a large organization, it won’t work if you try to keep things in detail exactly the same, rather there must be a larger give-and-take if there is to be a significant success. You must get the essentials of the job in mind and then design the mechanization to do that job rather than trying to mechanize the current version—if you want a significant success in the long run.

Hamming’s attitude about an “equivalent product” summarizes the frustration of writing software. What customers want differs from what they say they want. Automating manual human processes requires arduously encoding the loose judgments made by humans—often inconsistent and based on folk lore and experience. Software almost always falls short of really solving your problem. Accommodating the shortcomings requires a whole extra layer of process.

… My imagination may thus defeat itself by failing to give any ground. If a new interface is to replace pencil and paper mathematics, must I give up the ease of some routine mathematical tasks? Or remove them from my thinking style entirely? …

Mathematics succeeds only insofar as it advances human understanding. Pencil and paper may be the wrong tool for the next generation of great thinkers. But if we hope to enable future insights, we must understand how and why the existing tools facilitated the great ideas of the past. We must imbue the best features of history into whatever we build. If you, dear programmer, want to build those tools, I hope you will incorporate the lessons and insights of mathematics.

2 Sharing

mybinder, the flagship instance of binderhub hosts diverse workbooks on the cloud using containers.

BinderHub is a cloud-based technology that can launch a repository of code (from GitHub, GitLab, and others) in a browser window such that the code can be executed and interacted with. A unique URL is generated allowing the interactive code to be easily shared.

Hypergraph is a vaunted new experiment-and-analysis tracking system which promises some collaborative tools. I have not yet tried it.

3 Quarto

See quarto.

4 Glamorous Toolkit

Glamorous Toolkit

Glamorous Toolkit is the moldable development environment. It is a live notebook. It is a flexible search interface. It is a fancy code editor. It is a software analysis platform. It is a data visualization engine. All in one. And it is free and open-source under an MIT license.

Maybe this is closer to a data dashboard?

5 Deepnote

Deep note

Deepnote is a new kind of data science notebook. Jupyter-compatible with real-time collaboration and running in the cloud. Oh, and it’s free.

Hmm.

6 Jupyter

jupyter is weird enough to have its own notebook. It is somewhat python centric but pretty good with multiple languages and AFAICT secretly used inside e.g. RStudio.

However, jupyter had a good 10 years to justify itself to me and failed and I will not be coming back here until forced.

7 codebraid

Jupyter competitor codebraid is a literate programming tool.

8 Pweave

Pweave, by Matti Pastell, is python twin to knitr, in the lineage of literate coding tools. That is to say, it does less interactive notebook stuff and more straight-up report generation stuff.

Pweave is a scientific report generator and a literate programming tool for Python. It can capture the results and plots from data analysis and works well with numpy, scipy and matplotlib.

Max Masnick gives a detailed set up example.

9 Weave.jl

The Julia twin to PWeave or knitr is weave.jl See also Literate.jl. It looks similar.

The code chunk wil be run with default options and the output captured.
<<>>==
using Gadfly
x = linspace(0, 2* pi)
println(x)
plot(x = x, y = sin(x)
@

Or you could use RMarkdown/knitr in julia mode. It’s not yet clear to me how graphing works in that case. Changcheng Li claims: “easily”.

10 Tangle

Tangle did well here but appears to be little-maintained. Perhaps this is because these things are hard.

11 Stencila

TBD: stencila is a GUI for reproducible research, which essentiall is a GUI for knitr, like jupyter but for statisticians rather than computer scientists. It is best understood via an example, or the announcement.

Hosted. USD39/month.

12 knitr/RMarkdown

See knitr/RMarkdown.

13 MATLAB

14 Editor/IDE support

Miscellaneous preview support scripts are given in the knitr documentation.

14.1 VS Code

Many. TBD.

14.2 Atom

Atom supports a number of literate programming tools via language-weave — you might also want a full typesetting experience via the latex or atom-latex package, which can be made to support literate coding of plain LaTeX. AFAICS it uses Hydrogen to provide code preview.

Setting up a latex toolchain with a literate coding tool incorporated in atom-latex is not too bad. E.g. one for knitr:

{
    "root": "path/to/my/file.Rnw",
    "toolchain": "Rscript -e \"library(knitr); knit('%DOC.%EXT')\" && latexmk -synctex=1 -interaction=nonstopmode -file-line-error -pdf -halt-on-error %DOC",
    "latex_ext": [".Rnw"]
}

It would probably also work for pweave or Weave.jl.

15 References

Granger, and Pérez. 2021. Jupyter: Thinking and Storytelling With Code and Data.” Computing in Science Engineering.
Jirotka, Lee, and Olson. 2013. Supporting Scientific Collaboration: Methods, Tools and Concepts.” Computer Supported Cooperative Work (CSCW).
Lau, Drosos, Markel, et al. 2020. The Design Space of Computational Notebooks: An Analysis of 60 Systems in Academia and Industry.” In 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).
Poore. 2019. Codebraid: Live Code in Pandoc Markdown.” Proceedings of the 18th Python in Science Conference.
Sokol, and Flach. 2021. You Only Write Thrice: Creating Documents, Computational Notebooks and Presentations From a Single Source.” In.