The Living Thing / Notebooks :

Jupyter

The least excruciating compromise between 1) irreproducible science, and 2) spooking your colleagues with something too new-fangled

jupyter notebook in action
jupyter notebook in action

The python-derived entrant in the scientific workbook field is called jupyter.

Interactive “virtual notebook” computing for various languages; python/julia/R/whatever plugs into the open “kernel” interface. Jupyter allows easy(ish) online-publication-friendly worksheets, which are both interactive and easy to export for static online use. This is handy. So handy that it’s sometimes worth the many rough spots.

Jupyter considered more-or-less harmless

I’m an equivocal fan at best of the jupyter notebook interface, which some days seems to counteract every plus with a minus.

It’s friendly to use, but hard to install.

It’s easy to graphically explore your data, but hard to keep that exploration in version control.

It is open source, and written in an easy scripting language, python, so it seems it should be easy to tweak to taste. In practice it’s an ill-explained spaghetti of javascript and various external packages that relate obscurely to one another. The sum total is IMO no more easy to tweak than the various other UI development messes.

It makes it easy to explore your code output, but clashes with the fancy debugger that would make it easy to explore your code bugs.

These pain points seem acute for beginners and for experts, but perhaps are not so bad for projects of intermediate complexity, and jupyter seems good at making such projects look smooth, shiny, and inviting. That is, at the crucial moment when you need to make your data science project look sophisticated yet friendly, it help to lure colleagues into your web(-based IDE). Then it is too late mwhahahahah etc.

Some argue that the constraints of jupyter can lead to good architecture, such as Guillaume Chevallier and Jeremy Howard. This sounds like an interactive twist on the old test-driven-development rhetoric. I could be persuaded.

Argh! I can’t see part of the cell!

Sometime, you can’t see the whole code cell which is annoying. This is a known issue There is no known fix, but the workaround is simple enough

zooming out to 90% and zooming back in to 100% with Chrome Ctrl + - / +

Terminology

Confusing terminology alert. The notebook is the style of interface. Other applications with a notebook style of interface are Mathematica and MATLAB, Also, one implementation of said notebook interface for jupyter is called specifically the jupyter notebook, launched by the jupyter notebook command. Another common notebook-style interface implementation is called jupyter lab. Additionally notebooks are stores of python code and output that you keep on your disk, with file extension .ipynb. Which sense is meant you have to work out from context, e.g. the following sentence is not tautological:

You dawg, I heard you like notebooks, so I started up your jupyter notebook in jupyter notebook.

Version control for notebooks

Because jupyter notebooks (the file format) are a weird mash of binary multimedia content and program input and output data, all wrapped up in a JSON encoding, things get messy when you try to put them into version control. in particular, your repository gets very large and your git client may or may not show diffs. oh and merging is a likely to break things.

Here are some workarounds.

Strip notebooks

You can strip images and other big things from your notebook to keep them tidy automatically if you are using git as your version control. This means you lose the graphs and such that you just generated in your notebook. On the other hand you already have the code to generate them again right there, so you don’t want them around anyway, necessarily. See how fastai does this with automated git hooks. Not very well explained, but works smoothly. Try it out.

After you check a notebook out from git you will notice that there are no output cells any more. But you can recreate the outputs of all your code input cells by running them again if desired. I do this for all my notebooks now, and I created a repository that includes the good bits of their work, from which I clone all my repositories. You could too if you wanted this for your own repository. Or, if you are late to the party and already have a working repo,

git remote add jupyter_trimmer \
  https://github.com/danmackinlay/data_science_paper_base
git fetch jupyter_trimmer
git merge jupyter_trimmer/master \
  --allow-unrelated-histories \
  -m jupyter_trimmer
./tools/run-after-git-clone

See also nbstripout upon which this hack is AFAICT based and which includes its own installation script.

This doesn’t entirely solve the diffing and merging hurdles, but is usually just enough removal of pointless cruft that merging kind-of works.

Convert to normal text

One way you can make your notebooks manageable is to turn them into text. I haven’t tried this myself but it looks like it could be made to behave well and automatically.

jupytext can do that and more

Wish you could edit [jupyter notebooks] in your favourite IDE? And get clear and meaningful diffs when doing version control? Then… Jupytext may well be the tool you’re looking for!

Jupytext can save Jupyter notebooks as Markdown and R Markdown documents, Julia, Python, R, Bash, Scheme, Clojure, C++ and q/kdb+ scripts.

There are multiple ways to use jupytext:

Directly from Jupyter Notebook or JupyterLab. Jupytext provides a contents manager that allows Jupyter to save your notebook to your favorite format (.py, .R, .jl, .md, .Rmd …) in addition to (or in place of) the traditional .ipynb file. The text representation can be edited in your favorite editor. When you’re done, refresh the notebook in Jupyter: inputs cells are loaded from the text file, while output cells are reloaded from the .ipynb file if present. Refreshing preserves kernel variables, so you can resume your work in the notebook and run the modified cells without having to rerun the notebook in full.

On the command line. jupytext converts Jupyter notebooks to their text representation, and back. The command line tool can act on notebooks in many ways. It can synchronize multiple representations of a notebook, pipe a notebook into a reformatting tool like black, etc… It can also work as a pre-commit hook if you wish to automatically update the text representation when you commit the .ipynb file.

Diffing/merging notebooks natively

Diffing and merging of actual notebooks is painful in jupyter. Workaround: nbdime provides diffing and merging for notebooks. It has git integration:

nbdime config-git --enable --global

Frontends

Jupyter is, as presaged, a whole ecology of different language back end kernels talking to various front end executors

Notebook classic

Configuring

the location of themeing, widgets, CSS etc has moved of late; check your version number. The current location is ~/.jupyter/custom/custom.css, not the former location ~/.ipython/profile_default/static/custom/custom.css

Julius Schulz’s ultimate setup guide is also the ultimate pro tip compilation.

Auto-closing parentheses

Kill parenthesis molestation (a.k.a. bracket autoclose) with fire. Unless you like having to fight with your IDE’s misplaced faith in its ability to read your mind. The setting is tricky to find, because it is not called “put syntax errors in my code without me asking Y/N”, but instead cm_config.autoCloseBrackets and is not in the preference menus. According to a support ticket this should work.

# Run this in Python once, it should take effect permanently

from notebook.services.config import ConfigManager
c = ConfigManager()
c.update('notebook', {"CodeCell": {"cm_config": {"autoCloseBrackets": False}}})

or add the following to your custom.js:

define([
    'base/js/namespace',
], function(Jupyter) {
    Jupyter.CodeCell.options_default.cm_config.autoCloseBrackets = false;
})

or maybe create ~/.jupyter/nbconfig/notebook.json with the content

{
  "CodeCell": {
    "cm_config": {
      "autoCloseBrackets": false
    }
  }
}

That doesn’t work with jupyterlab, which is even more righteously sure that it knows better than you. but perhaps the following does? Go to Settings --> Advanced Settings Editor and add the following to the User Overrides section:

{
  "codeCellConfig": {
    "autoClosingBrackets": false
  }
}

Notebook extensions

Jupyter classic is more usable if you install the notebook extensions, which includes, e.g. drag-and-drop image support.

$ pip install --upgrade jupyter_contrib_nbextensions
$ jupyter contrib nbextension install --user

For example, if you run nbconvert to generate a HTML file, this image will remain outside of the html file. You can embed all images by using the calling nbconvert with the EmbedPostProcessor.

$ jupyter nbconvert --post=embed.EmbedPostProcessor

Update – broken in Jupyter 5.0

Wait, that was still pretty confusing; I need the notebook configurator whatsit.

$ pip install --upgrade jupyter_nbextensions_configurator
$ jupyter nbextensions_configurator enable --user

Jupyter lab

jupyter lab (sometimes styled jupyterlab) is the current cutting edge, and reputedly is much nicer to develop plugins for than the notebook interface. From the user perspective it’s more or less the same thing, but the annoyances are different. It does not strictly dominate notebook in terms of user experience, however, even if it does in terms of experience for plugin developers.

The UI, though… The mysterious curse of javascript development is that once you have tasted it, you are unable to resist an uncontrollable urge to reimplement something that already worked, but as a crappier javascript version. The jupyter lab creators have succumbed as far as reimplementing copy, paste, search/replace, browser tabs and the command line. The replacement versions run in parallel to the existing versions, with clashing keyboard shortcuts and confusingly similar but distinct function.

Because I am used to how all these functions work in the browser, it would have be to an astonishing improvement in each of them to be worth my time learning the new jupyterlab system, which after all, I am not using for its quirky alternate take on tabs, cut-and-paste etc, but because I want a quick interface to run some shareable code with embedded code and graphics.

Needless to say, large UX improvements are not delivered, but rather, we get some unintuitive trade-offs like a search function which non-deterministically sometimes does regexp matching but then doesn’t search the whole page. Or something? Some jupyterlab enthusiasts want to re-implement text editors too. Much artisinal hand made crafts!

Whether you like the overwrought jupyter lab UX or not, we should all live with whatever NIH it has, if the developer API is truly cleaner and and easier to work with. That would be a solid win in terms of delivering the interactive coding features I would actually regard as improvements. In the meantime, I will withstand fussy search-and-replace and Yo dawg I heard you like notebook tabs so I put notebook tabs in your notebook tab etc.

Personal peeve: As presaged, jupyter lab molests brackets, compulsorily as test of your faith per default.

Lab extensions

Related to, inspired but and maybe conflicting or intersecting with the nbextensions are the labextensions, which add bits of extra functionality to the lab interface rather than the notebook interface (where the lab interface is built upon the notebook interface and runs notebooks just like it but has some different moving parts under the hood.)

I try to keep the use of these to a minimum as I have a possibly irrational foreboding that some complicated death spiral of version clashes is beginning between all the different juptyer kernel and lab and notebook installations I have cluttering up my hard disk and it can’t improve things to put various versions of lab extensions in the mix can it? And I really don’t want to have to understand how it works to work out whether that is true or not, so please don’t explain it to me.

Anyway there are some very useful ones, especially the table of contents, so let’s live with it by running install and update commands obsessively in every combination of kernel/lab/whatever environment in the hope that something sticks.

Life is easier with jupyerlab-toc which allows you to navigate your lab notebook by markdown section headings.

jupyter labextension install @jupyterlab/toc

The upgrade command is

jupyter labextension update @jupyterlab/toc

Integrated diagram editor? Someone integrated drawio as jupyterlab-drawio to prove a point about the developer API thing.

jupyter labextension install jupyterlab-drawio

LaTeX editor? As flagged, I think this is a terrible idea. Even worse than the diagram editor. There are better editors than jupyter, better means of scientific communication than latex, and better specific latex tooling, but I will concede there is some kind of situation where this sweet spot of mediocrity might be useful, e.g. a plot point a highly contrived techno-thriller script written by cloistered nerds. If you find yourself in such dramaturgical straits:

jupyter labextension install @jupyterlab/latex

One nerdy extension is jupyter-matplotlib, aka, confusingly, ipympl, which integrates interactive plotting into the notebook better..

pip install ipympl
# If using JupyterLab

# Install nodejs: https://nodejs.org/en/download/
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install jupyter-matplotlib

Rich display

Various objects support rich display of python objects e.g. IPython.display.Image

from IPython.display import Image
Image(filename='img/test.png')

or you can use markdown for local image display

![title](img/test.png)

If you want to make your own objects display, uh, richly, you can implement the appropriate magical methods:

class Shout(object):
    def __init__(self, text):
        self.text = text

    def _repr_html_(self):
        return "<h1>" + self.text + "</h1>"

I leverage this to make a latex renderer called latex_fragment which you should totally check out for rendering inline algorithms, or for emitting SVG equations.

Custom kernels

jupyter looks for kernel specs in a kernel spec directory, depending on your platform.

Say your kernel is dan:

See the manual.

How to set up jupyter to use a virtualenv (or other) kernel.

tl;dr Do this from inside the virtualenv to bootstrap it:

pip install ipykernel
python -m ipykernel install --user --name=my-virtualenv-name

Addendum: for Anaconda, you can auto-install all conda envs, which worked for me, unlike the ipykernel method.

conda install nb_conda_kernels

custom kernel lite – e.g. if you wish to run a kernel with different parameters. for example with a GPU-enabled launcher. See here for an example for GPU-enabled kernels:

For computers on Linux with optimus, you have to make a kernel that will be called with optirun to be able to use GPU acceleration.

For me this was in fact primusrun.

I made a kernel in ~/.local/share/jupyter/kernels/dan/kernel.json and modified it thus:

{
"display_name": "dan-gpu",
"language": "python",
"argv": [
    "/usr/bin/primusrun",
    "/home/me/.virtualenvs/dan/bin/python",
    "-m",
    "ipykernel_launcher",
    "-f",
    "{connection_file}"
]
}

I also wrote a wrapper script called primuslessrun which allows me to use CUDA virtualenvs but not the actual GPU, by setting an additional variable in the script:

CUDA_VISIBLE_DEVICES=

There is even a MATLAB bridge

Graphs

Set up inline plots:

%matplotlib inline

inline svg:

%config InlineBackend.figure_format = 'svg'

Graph sizes are controlled by matplotlib. Here’s how to make big graphs:

import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (10.0, 8.0)

Interesting-looking other graphing options:

Jupyter lab includes such nifty features as a diagram editor which you can install using jupyter labextension install jupyterlab-drawio

Exporting notebooks

You can host static versions easily using nbviewer (and github will do this automatically.)

For fancy variations you need to read how the document templates work/

Here is a base latex template for your academic use.

For very special occasions you can write your own or customize an existing exporter Once again, Julis Schulz has virtuosic tips, e.g. using cell metadata like this:

{
"caption": "somecaption",
"label": "fig:somelabel",
"widefigure": true
}

Presentations using Jupyter

Basic: export to reveal.js

You can use my favourite dorky presentation hack!

The easiest is Classic reveal.js mode. tl;dr:

$ jupyter nbconvert --to slides some_notebook.ipynb  --post serve

You might want to make various improvements, such as tweaking the reveal.js settings in jupyter slideshows

If you aren’t running a coding class, you will want to hide the input cells from your IPython slides by customising the output templates, or you can suppress all code by using output format hide_code_slides.

These kind of custom tweaks are not too crazy but you need a copy of the reveal.js sourcecode for some of them. A more comprehensive version using a custom theme and the reveal.js source looks like

jupyter nbconvert Presentation.ipynb  \
    --to slides --reveal-prefix ../../reveal.js \
    --post serve --SlidesExporter.reveal_theme=league

Fancier: integrated slideshows using RISE

Fancier again: interactive slideshows using RISE.

To meet your house style requirements it is usually sufficient to customise some decorations and alter some css.

Major plus: you can execute code while running the slide!

Major minus: there is no facility that I can see to style your cover slides differently, which is incompatible with, e.g., my university’s style guide.

If you don’t wish to display inline input code, you can avoid it with hide_code

Install using pip:

pip install hide_code
jupyter nbextension install --py hide_code
jupyter nbextension enable --py hide_code
jupyter serverextension enable --py hide_code

Insall using conda:

conda install -c conda-forge hide_code

Citations and other academic writing in Jupyter

I did this for

  1. my blog – using simple Zotero markdown citation export, which is not great for inline citations but fine for bibliographies, and very easy and robust.

  2. my papers – abandoning jupyter in favour of Pweave+pandoc, which works amazingly for everything if you use pandoc tricks for your citations.

I couldn’t find a unified approach for these two different use cases which didn’t sound like more work than it was worth. At least, many academics seem to have way more tedious and error-prone workflows than this, so I’m just being a fancy pants if I try to tweak it further.

More recently there is jupyterbook which enables notebook-based blog rendering, including citations. This is built using the ruby site generaltor jekyll-scholar so is heavy in dependencies but it seems to work.

Chris Sewell has produced a scripted called ipypublish that eases some of the pain points in producing articles. It’s an impressive piece of work. (See the comments for some additional pro-tips for this.)

My own latex_fragment allows you to insert 1-off latex fragments into jupyter and pweave (e.g. algorithmic environments or some weird tikz thing.)

Jean-François Bercher’s jupyter_latex_envs reimplements various latex markup as native jupyter including \cite. I

Sylvain Deville recommends treating jupyter as a glorified markdown editor and then using pandoc, which is an OK workflow if you are producing a once-off paper, but not for a repeatedly updated blog.

nbconvert has built-in citation support but only for LaTeX output. Citations look like this in markup:

<cite data-cite="granger2013">(Granger, 2013)</cite>

or even

<strong data-cite="granger2013">(Granger, 2013)</strong>

The template defines the bibliography source and looks like:

((*- extends 'article.tplx' -*))

((* block bibliography *))j
((( super () )))
\bibliographystyle{unsrt}
\bibliography{refs}
((* endblock bibliography *))

And building looks like:

jupyter nbconvert --to latex --template=print.tplx mynotebook.ipynb

As above, it helps to know how the document templates work.

Note that even in the best case you don’t have access to natbib-style citation, so auto-year citation styles will look funky.

Speaking of custom templates, the nbconvert setup is customisable for more than latex.

{% extends 'full.tpl'%}
{% block any_cell %}
    <div style="border:thin solid red">
        {{ super() }}
    </div>
{% endblock any_cell %}

But how about for online? cite2c seems to do this by live inserting citations from zotero, including author-year stuff. (Requires Jupyter notebook 4.2 or better which might require a pip install --upgrade notebook)

Julius Schulz gives a comprehensive config for this and everything else.

This workflow is smooth for directed citing, but note that there is no way to include a bibliography except by citation, so you have to namecheck every article; and the citation keys it uses are zotero citation keys which are nothing like your BibTeX keys so can’t really be manually edited.

if you are customising the output of jupyter’s nbconvert, you should be aware that the {% block output_prompt %} override doesn’t actually do anything in the templates I use. (Slides, HTML, LaTeX). Instead you need to use a config option:

$ jupyter nbconvert --to slides some_notebook.ipynb \
   --TemplateExporter.exclude_output_prompt=True \
    --post serve
I had to [use the source](https://github.com/jupyter/nbconvert/blob/db3036303237d45db9886c44e31132f90ef8d653/nbconvert/templates/html/basic.tpl)
to discover this.

ipyBibtex.ipynb? Looks like this:

%%cite
Lorem ipsum dolor sit amet
__\citep{hansen1982,crealkoopmanlucas2013}__,
consectetuer adipiscing elit,
sed diam nonummy nibh euismod tincidunt
ut laoreet dolore magna aliquam erat volutpat.

So it supports natbib-style author-year citations! But it’s a small, unmaintained package so is risky.

TODO: Work out how Mark Masden got citations working?

Interactive visualisations/simulations etc

Jupyter allows interactions! This is the easiest python UI system I have seen, for all that it is basic.

Official Manual: ipywidgets.

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

See also the announcement: Jupyter supports interactive JS widgets, where they discuss the data binding module in terms of javascript UI thingies.

Pro tip: If you want a list of widgets

from ipywidgets import widget
widget.Widget.widget_types

External event loops

External event loops are now easy and documented. What they don’t say outright is that if you want to use the tornado event loop, relax because both the jupyter server and the ipython kernel already use the pyzmq event loop which subclasses the tornado one.

If you want you make this work smoothly without messing around with passing ioloops everywhere, you should make zmq install itself as the default loop:

from zmq.eventloop import ioloop
ioloop.install()

Now, your asynchronous python should just work using tornado coroutines.

NB with the release of latest asyncio and tornado and various major version incompatibilities, I’m curious how smoothly this all still works.

Javascript from python with jupyter

As seen in art python.

Here’s how you invoke javascript from jupyter. Here is the jupyter JS source And here is the full jupyter browser JS manual, and the Jupyter JS extension guide.

Hosting live jupyter notebooks on the internet

Jupyter can host online notebooks, even multi-user notebook servers - if you are brave enough to let people execute weird code on your machine. I’m not going to go into the security implications here.

Commercial notebook hosts

NB: This section is outdated. TBD; I should probably mention the ill-explained Kaggle kernels and google cloud ML execution of same, etc.

Base level, you can run one using a standard a standard cloud option like buying compute time as a virtual machine or container, and using a jupyter notebook for their choice of data science workflow.

Special mention to two early movers:

Colaboratory is a free Jupyter notebook environment that requires no setup and runs entirely in the cloud.

With Colaboratory you can write and execute code, save and share your analyses, and access powerful computing resources, all for free from your browser.

Here is an intro and here is another

Miscellaneous tips and gotchas

Debugging

This is all build on ipython so you invoke the debugger ipython-style, specifically:

from IPython.core.debugger import Tracer; Tracer()()      # < 5.1
from IPython.core.debugger import set_trace; set_trace()  # >= v5.1

IOPub data rate exceeded.

You got this error and you weren’t doing anything that bandwidth intensive? Say, you were just viewing a big image, not a zillion images? It’s jupyter being conservative in version 5.0

jupyter notebook --generate-config
atom ~/.jupyter/jupyter_notebook_config.py

update the c.NotebookApp.iopub_data_rate_limit to be big, e.g. c.NotebookApp.iopub_data_rate_limit = 10000000.

This is fixed after 5.0.

Offline MathJax in jupyter

python -m IPython.external.MathJax /path/to/source/MathJax.zip