The Living Thing / Notebooks :

Jupyter

The least excruciating compromise between irreproducible science, and spooking your luddite colleagues with something too new-fangled

jupyter notebook in action

The python-derived entrant in the scientific workbook field is called jupyter.

Works with python/julia/r/various. Jupyter allows easy(ish) online-publication-friendly worksheets, which are both interactive and easy to export for static online use. This is handy. So hand that it’s worth the many agonising broken things that you encounter while trying to benefit from this.

To install it, see the jupyter homepage.

Contents

Frontends

Jupyter is a whole ecology of different language backend kernels talking to various frontend executors

Notebook classic

First, unless you like having to fight with a computers aggressive parenthesis deletion algorithm, kill its parentheses molestation function with fire. The setting is tricky to find, because it is not called “put syntax errors in my code without me asking”, but instead cm_config.autoCloseBrackets. According to a support ticket this should work.

# Run this in Python once, it should take effect permanently
from notebook.services.config import ConfigManager
c = ConfigManager()
c.update('notebook', {"CodeCell": {"cm_config": {"autoCloseBrackets": False}}})

or add the following to your custom.js:

define([
    'base/js/namespace',
], function(Jupyter) {
    Jupyter.CodeCell.options_default.cm_config.autoCloseBrackets = false;
})

Now, onto other stuff.

  • Julius Schulz’s ultimate setup guide is also the ultimate pro tip compilation. One that really help me was the hack to produce image captions.

  • Jupyter classic is more usable if you install the notebook extensions, which includes, e.g. drag-and-drop image support.

    $ pip install --upgrade jupyter_contrib_nbextensions
    $ jupyter contrib nbextension install --user
    

    For example, if you run nbconvert to generate a HTML file, this image will remain outside of the html file. You can embed all images by using the calling nbconvert with the EmbedPostProcessor.

    $ jupyter nbconvert --post=embed.EmbedPostProcessor
    

    Update - broken in Jupyter 5.0

  • Wait, that was still pretty confusing; I need the notebook configurator whatsit.

    $ pip install --upgrade jupyter_nbextensions_configurator
    $ jupyter nbextensions_configurator enable --user
    
  • the location of theming, widgets CSS etc has moved of late; check your version number. The current location is ~/.jupyter/custom/custom.css, not the former location ~/.ipython/profile_default/static/custom/custom.css

Jupyter lab

jupyter lab is the current cutting edge, and reputedly is much nicer to develop plugins for than the notebook interface. It’s more or less the same thing, but with many tweaks from both user and developer perspectives. It does not strictly dominate notebook in terms of user experience, IMO. I’m not a huge fan of how jupyter lab reinvents heaps of wheels. They attempt to reinvent copy, paste, and tabs, all of which my browser does fine, and I have a suspicion they also want to re-invent text editors too, which my text editors already do great. You have to live with this because the API is supposed to be cleaner and easier to work with, so that’s probably good long term.

Personal peeve: jupyter lab loves bracket molesting, and has made that particular form of syntax error introduction compulsory as a test of your faith.

Rich display

Various objects support rich display of python objects e.g. IPython.display.Image

from IPython.display import Image
Image(filename='img/test.png')

or you can use markdown for local image display

![title](img/test.png)

I leverage this to make a latex renderer called latex_fragment which you should totally check out for rendering inline algorithms, or for emitting SVG equations.

Custom kernels

jupyter looks for kernel specs in a kernel spec directory, depending on your platform.

Say your kernel is dan:

See the manual.

There is even a MATLAB bridge

Graphs

Set up inline plots:

%matplotlib inline

inline svg:

%config InlineBackend.figure_format = 'svg'

Graph sizes are controlled by matplotlib. Here’s how to make big graphs:

import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (10.0, 8.0)

Interesting-looking other extensions:

Jupyter lab includes such nifty features as a diagram editor which you can install using jupyter labextension install jupyterlab-drawio

Exporting notebooks

Presentations using Jupyter

Citations and other academic writing in Jupyter

tl;dr I did this for

  • my blog — using simple Zotero markdown citation export, which is not great for inline citations but fine for bibliographies, and very easy and robust.
  • my papers — using the ipypublish option, which works ok, but is annoying for citations
  • my papers — using the Pweave option, which works amazingly for everything if you use pandoc tricks for your citations.

I couldn’t find a unified approach for these two different use cases which didn’t sound like more work than it was worth. At least, many academics seem to have way more tedious and error-prone workflows than this, so I’m just being a fancy pants if I try to tweak it further.

  • Pweave by Matti Pastell is a clone of knitr:

    Pweave is a scientific report generator and a literate programming tool for Python. It can capture the results and plots from data analysis and works well with numpy, scipy and matplotlib.

    Documented by Max Masnick. Whee, executable markdown pages.

  • Chris Sewell has produced a scripted called ipypublish that eases some of the pain points in producing articles. It’s an impressive piece of work. (See the comments for some additional pro-tips for this.)

  • My own latex_fragment allows you to insert 1-off latex fragments into jupyter and pweave (e.g. algorithmic environments or some weird tikz thing)

  • Jean-François Bercher’s jupyter_latex_envs reimplements various latex markup as native jupyter including \cite.

  • Sylvain Deville recommends treating jupyter as a glorified markdown editor and then using pandoc, which is an OK workflow if you are producing a once-off paper, but not for a repeatedly updated blog.

  • nbconvert has built-in citation support but only for LaTeX output. Citations look like this:

    <cite data-cite="granger2013">(Granger, 2013)</cite>
    

    or even

    <strong data-cite="granger2013">(Granger, 2013)</strong>
    

    The template defines the bibliography source and looks like:

    ((*- extends 'article.tplx' -*))
    
    ((* block bibliography *))j
    ((( super () )))
    \bibliographystyle{unsrt}
    \bibliography{refs}
    ((* endblock bibliography *))
    

    And building looks like:

    jupyter nbconvert --to latex --template=print.tplx mynotebook.ipynb
    

    As above, it helps to know how the document templates work.

    Note that even in the best case you don’t have access to natbib-style citation, so auto-year citation styles will look funky.

  • Speaking of custom templates, the nbconvert setup is customisable for more than latex.

    {% extends 'full.tpl'%}
    {% block any_cell %}
        <div style="border:thin solid red">
            {{ super() }}
        </div>
    {% endblock any_cell %}
    
  • but how about for online? cite2c seems to do this by live inserting citations from zotero, including author-year stuff. (Requires Jupyter notebook 4.2 or better which might require a pip install --upgrade notebook)

    Julius Schulz gives a comprehensive config for this and everything else.

    This workflow is smooth for directed citing, but note that there is no way to include a bibliography except by citation, so you have to namecheck every article; and the citation keys it uses are zotero citation keys which are nothing like your bibtex keys so can’t really be manually edited.

  • if you are customising the output of jupyter’s nbconvert, you should be aware that the {% block output_prompt %} override doesn’t actually do anything in the templates I use. (Slides, HTML, LaTeX). Instead you need to use a config option:

    $ jupyter nbconvert --to slides some_notebook.ipynb \
       --TemplateExporter.exclude_output_prompt=True \
       --post serve
    

    I had to use the source to discover this.

  • ipyBibtex.ipynb? Looks like this:

    %%cite
    Lorem ipsum dolor sit amet
    __\citep{hansen1982,crealkoopmanlucas2013}__,
    consectetuer adipiscing elit,
    sed diam nonummy nibh euismod tincidunt
    ut laoreet dolore magna aliquam erat volutpat.
    

    So it supports natbib-style author-year citations! But it’s a small, unmaintained package so is risky.

  • work out how Mark Masden got citations working?

Interactive visualisations/simulations etc

Jupyter allows interactions! This is by far the easiest python UI system I have seen, for all that it is basic.

Official Manual: ipywidgets.

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

See also the announcement: Jupyter supports interactive JS widgets, where they discuss the data binding module in terms of javascript UI thingies.

Pro tip: If you want a list of widgets

from ipywidgets import widget
widget.Widget.widget_types

External event loops

External event loops are now easy and documented. What they don’t say outright is that if you want to use the tornado event loop, relax because both the jupyter server and the ipython kernel already use the pyzmq event loop which subclasses the tornado one.

If you want you make this work smoothly without messing around with passing ioloops everywhere, you should make zmq install itself as the default loop:

from zmq.eventloop import ioloop
ioloop.install()

Now, your asynchronous python should just work using tornado coroutines.

NB with the release of latest asyncio and tornado and various major version incopatibilities, I’m curious how smoothly this all still works.

Hosting live jupyter notebooks on the internet

Jupyter can host online notebooks, even multi-user notebook servers - if you are brave enough to let people execute weird code on your machine.

Commercial notebook hosts

Note

This section is outdated. TBD; I should probably mention the ill-explained Kaggle kernels and google cloud ML execution of same, etc.

  • Here’s an example of how you would get live (dynamic) ones running on Amazon for free or cheap
  • sagemath runs notebooks online, with fancy features starting at $7/month. Messy design but tidy open-source ideals.
  • Anaconda.org appears to be a python package development service, but they also have a sideline in hosting notebooks. ($7/month) Requires you to use their anaconda python distribution tools to work, which is… a plus and a minus. The anaconda python distro is simple for scientific computing, but if your hard disk is as full of python distros as mine is you tend not to want more confusing things and wasting disk space.

Miscellaneous tips and gotchas

IOPub data rate exceeded.

You got this error and you weren’t doing anything that bandwidth intensive? Say, you were just viewing a big image, not a zillion images? It’s jupyter being conservative in version 5.0

jupyter notebook --generate-config
atom ~/.jupyter/jupyter_notebook_config.py

update the c.NotebookApp.iopub_data_rate_limit to be big, e.g. c.NotebookApp.iopub_data_rate_limit = 10000000.

This is fixed after 5.0.

Diffing/merging

jupyter diffing and merging is painful. Workaround: nbdime provides diffing and merging for notebooks. It has git integration:

nbdime config-git --enable --global

Offline mathjax in jupyter

python -m IPython.external.mathjax /path/to/source/mathjax.zip