The Living Thing / Notebooks :

Jupyter

The least excruciating compromise between irreproducible science, and spooking your old-school colleagues

jupyter notebook in action

The python-derived entrant in the scientific workbook field is called jupyter.

Works with python/julia/r/various. Jupyter allows easy(ish) online-publication-friendly worksheets, which are both interactive and easy to export for static online use. This is handy.

To install it, see the jupyter homepage.

Contents

Frontends

The default jupyter ships as a browser-based coding environment. You can also access it using

Custom kernels

Graphs

Set up inline plots:

%matplotlib inline

inline svg:

%config InlineBackend.figure_format = 'svg'

Graph sizes are controlled by matplotlib. Here’s how to make big graphs:

import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (10.0, 8.0)

Interesting-looking other extensions:

Interactive visualisations/simulations etc

Jupyter allows interactions! This is by far the easiest python UI system I have seen, for all that it is basic.

Official Manual: ipywidgets.

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

See also the announcement: Jupyter supports interactive JS widgets, where they discuss the data binding module in terms of javascript UI thingies.

Pro tip: If you want a list of widgets

from ipywidgets import widget
widget.Widget.widget_types

External event loops

External event loops are now easy and documented. What they don’t say outright is that if you want to use the tornado event loop, relax because both the jupyter server and the ipython kernel already use the pyzmq event loop which subclasses the tornado one.

If you want yo make this work smoothly without messing around with passing ioloops everywhere, you should make zmq install itself as the default loop:

from zmq.eventloop import ioloop
ioloop.install()

Now, your asynchronous python should just work using tornado coroutines.

Exporting notebooks

Presentations using Jupyter

Citations and other academic writing in Jupyter

tl;dr I did this for

  • my blog — using simple Zotero markdown citation export, which is not great for inline citations but fine for bibliographies, and very easy and robust.
  • my papers — using the nbconvert option, which works fine.

I couldn’t find a unified approach for these two which didn’t sound like more work than it was worth. At least, many academics seem to have way more tedious and error-prone workflows than this, so I’m just being a fancy pants if I try to tweak it yet further.

  • Chris Sewell has produced a scripted called ipypublish that eases some of the pain points in producing articles. It’s an impressive piece of work. (See the comments for some additional pro-tips for this.)

    Quibble: Some of his neat citation workflow (although not the whole package) depends on zotero version 4.0, which is about to be replaced by 5.0. So the golden age this ushers in may be short-lived unless someone wants to step in to fix up the betterbibtex plugin.

  • Sylvain Deville recommends treating jupyter as a glorified markdown editor and then using pandoc, which is an OK workflow if you are producing a once-off paper, but not for a repeatedly updated blog.

  • nbconvert has built-in citation support but only for LaTeX output. Citations look like this:

    <cite data-cite="granger2013">(Granger, 2013)</cite>
    

    or even

    <strong data-cite="granger2013">(Granger, 2013)</strong>
    

    The template defines the bibliography source and looks like:

    ((*- extends 'article.tplx' -*))
    
    ((* block bibliography *))j
    ((( super () )))
    \bibliographystyle{unsrt}
    \bibliography{refs}
    ((* endblock bibliography *))
    

    And building looks like:

    jupyter nbconvert --to latex --template=print.tplx mynotebook.ipynb
    

    As above, it helps to know how the document templates work.

    Note that even in the best case you don’t have access to natbib-style citation, so auto-year citation styles will look funky.

  • Speaking of custom templates, the nbconvert setup is customisable for more than latex.

    {% extends 'full.tpl'%}
    {% block any_cell %}
        <div style="border:thin solid red">
            {{ super() }}
        </div>
    {% endblock any_cell %}
    
  • but how about for online? cite2c seems to do this by live inserting citations from zotero, including author-year stuff. (Requires Jupyter notebook 4.2 or better which might require a pip install --upgrade notebook)

    Julius Schulz gives a comprehensive config for this and everything else.

    This workflow is smooth for directed citing, but note that there is no way to include a bibliography except by citation, so you have to namecheck every article; and the citation keys it uses are zotero citation keys which are nothing like your bibtex keys so can’t really be manually edited.

  • if you are customising the output of jupyter’s nbconvert, you should be aware that the {% block output_prompt %} override doesn’t actually do anything in the templates I use. (Slides, HTML, LaTeX). Instead you need to use a config option:

    $ jupyter nbconvert --to slides some_notebook.ipynb \
       --TemplateExporter.exclude_output_prompt=True \
       --post serve
    

    I had to use the source to discover this.

  • ipyBibtex.ipynb? Looks like this:

    %%cite
    Lorem ipsum dolor sit amet
    __\citep{hansen1982,crealkoopmanlucas2013}__,
    consectetuer adipiscing elit,
    sed diam nonummy nibh euismod tincidunt
    ut laoreet dolore magna aliquam erat volutpat.
    

    So it supports natbib-style author-year citations! But it’s a small, unmaintained package so is risky.

  • work out how Mark Masden got citations working?

General Pro tips

IOPub data rate exceeded.

You got this error and you weren’t doing anything that bandwidth intensive? Say, you were just viewing a big image, not a zillion images? It’s jupyter being conservative in version 5.0

jupyter notebook --generate-config
atom ~/.jupyter/jupyter_notebook_config.py

update the c.NotebookApp.iopub_data_rate_limit to be big, e.g. c.NotebookApp.iopub_data_rate_limit = 10000000.

Hosting live jupyter notebooks on the internet

Jupyter can host online notebooks, even multi-user notebook servers - if you are brave enough to let people execute weird code on your machine.

Commercial notebook hosts

Note

This section is outdated. TBD; I should probably mention the ill-explained Kaggle kernels and google cloud ML execution of same, etc.

  • Here’s an example of how you would get live (dynamic) ones running on Amazon for free or cheap
  • sagemath runs notebooks online, with fancy features starting at $7/month. Messy design but tidy open-source ideals.
  • Anaconda.org appears to be a python package development service, but they also have a sideline in hosting notebooks. ($7/month) Requires you to use their anaconda python distribution tools to work, which is… a plus and a minus. The anaconda python distro is simple for scientific computing, but if your hard disk is as full of python distros as mine is you tend not to want more confusing things and wasting disk space.