The Living Thing / Notebooks :

Jupyter

The least excruciating compromise between 1) irreproducible science, and 2) spooking your colleagues with something too scarily new-fangled

jupyter notebook in action

The python-derived entrant in the scientific workbook field is called jupyter.

Works with python/julia/R/various. Jupyter allows easy(ish) online-publication-friendly worksheets, which are both interactive and easy to export for static online use. This is handy. So handy that it's sometimes worth the many agonising broken things that you encounter while trying to benefit from this.

To install it, see the jupyter homepage.

Jupyter considered harmless

I'm not a massive fan of the jupyter notebook/lab interface, which I consider to be a kind of exploration of the pathology of browser-based clients. It has some nifty elegant stuff, which is obscured behind a lot of bolted-on cruft, and the result is not a harmonious scientific computation workflow. But it is easy to get started with if you want to graphically explore what you just did. This at least is a win.

Some argue that the constraints of jupyter can lead to good architecture, such as Guillaume Chevallier and Jeremy Howard. This sounds like an interactive twist on the old test-driven-development rhetoric.

Version control

stripping extra crap

You can strip images and other big things from your notebook to keep version control tidy. See how fastai does this with automated github hooks.

Diffing/merging

Diffing and merging is painful in jupyter. Workaround: nbdime provides diffing and merging for notebooks. It has git integration:

nbdime config-git --enable --global

Frontends

Jupyter is a whole ecology of different language back end kernels talking to various front end executors

Notebook classic

Configuring

the location of themeing, widgets, CSS etc has moved of late; check your version number. The current location is ~/.jupyter/custom/custom.css, not the former location ~/.ipython/profile_default/static/custom/custom.css

Julius Schulz's ultimate setup guide is also the ultimate pro tip compilation.

Auto-closing parentheses gives you cancer

First kill its parenthesis molestation function with fire. Unless you like having to fight with your IDE's assinine faith in its ability to read your mind. The setting is tricky to find, because it is not called “put syntax errors in my code without me asking Y/N”, but instead cm_config.autoCloseBrackets. According to a support ticket this should work.

# Run this in Python once, it should take effect permanently
from notebook.services.config import ConfigManager
c = ConfigManager()
c.update('notebook', {"CodeCell": {"cm_config": {"autoCloseBrackets": False}}})

or add the following to your custom.js:

define([
    'base/js/namespace',
], function(Jupyter) {
    Jupyter.CodeCell.options_default.cm_config.autoCloseBrackets = false;
})

(That doesn't work with jupyterlab, which would instead like you to go fuck yourself. Just wait for the syntax errors.)

Notebook extensions

Jupyter classic is more usable if you install the notebook extensions, which includes, e.g. drag-and-drop image support.

$ pip install --upgrade jupyter_contrib_nbextensions
$ jupyter contrib nbextension install --user

For example, if you run nbconvert to generate a HTML file, this image will remain outside of the html file. You can embed all images by using the calling nbconvert with the EmbedPostProcessor.

$ jupyter nbconvert --post=embed.EmbedPostProcessor

Update – broken in Jupyter 5.0

Wait, that was still pretty confusing; I need the notebook configurator whatsit.

$ pip install --upgrade jupyter_nbextensions_configurator
$ jupyter nbextensions_configurator enable --user

Jupyter lab

jupyter lab is the current cutting edge, and reputedly is much nicer to develop plugins for than the notebook interface. From the user perspective it's more or less the same thing, but the annoyances are different. It does not strictly dominate notebook in terms of user experience, however, even if it does in terms of experience for plugin developers.

The UI, though… The mysterious curse of javascript development is that once you have tasted it, you are unable to resist an uncontrollable urge to reimplement something that already worked as a crappier javascript version. These folks reimplement copy, paste, search/replace, browser tabs and the command line. The replacement versions run in parallel to the existing versions, with clashing keyboard shortcuts and confusingly similar function

Because I am used to how all these functions work in the browser, it would have be to an astonishingly large improvement in each of them to be worth my time learning the new jupyterlab system, which after all, I am not using for its quirky alternate take on tabs, cut-and-paste etc, but because I want a quick interface to run some shareable code with embedded code and graphics.

Needless to say, large UX improvements are not delivered, but rather, we get some unintuitive trade-offs like a search function which non-deterministically sometimes does regexp matching but then doesn't search the whole page. Or something? Some jupyterlab enthusiasts want to re-implement text editors too. Much artisinal hand made crafts!

Whether you like the overwrought jupyter lab UX or not, we should all live with whatever NIH it has, if the developer API is truly cleaner and and easier to work with. That would be a solid win in terms of delivering the interactive coding features I would actually regard as improvements. In the meantime, I will withstand fussy search-and-replace and tabs-within-tabs etc.

Personal peeve: As presaged, jupyter lab molests brackets, and has made that particular form of syntax error introduction compulsory as a test of your commitment.

Life is easier with jupyerlab-toc which allows you to navigate your lab notebook by markdown section headings.

jupyter labextension install @jupyterlab/toc

Integrated diagram editor? Someone integrated drawio as jupyterlab-drawio to prove a point about the developer API thing.

jupyter labextension install jupyterlab-drawio

latex editor? As flagged, I think this is a terrible idea. There are better editors than jupyter, better means of scientific communication than latex, and better specific latex tooling, but I will concede there is some kind of situation where this sweet spot of mediocrity might be useful, if only as a plot point in the kind of highly contrived techno-thriller script written by cloistered nerds.

jupyter labextension install @jupyterlab/latex

Rich display

Various objects support rich display of python objects e.g. IPython.display.Image

from IPython.display import Image
Image(filename='img/test.png')

or you can use markdown for local image display

![title](img/test.png)

If you want to make your own objects display, uh, richly, you can implement the appropriate magical methods:

class Shout(object):
    def __init__(self, text):
        self.text = text

    def _repr_html_(self):
        return "<h1>" + self.text + "</h1>"

I leverage this to make a latex renderer called latex_fragment which you should totally check out for rendering inline algorithms, or for emitting SVG equations.

Custom kernels

jupyter looks for kernel specs in a kernel spec directory, depending on your platform.

Say your kernel is dan:

See the manual.

How to set up jupyter to use a virtualenv (or other) kernel.

tl;dr Do this from inside the virtualenv to bootstrap it:

pip install ipykernel
python -m ipykernel install --user --name=my-virtualenv-name

Addendum: for Anaconda, you can auto-install all conda envs, which worked for me, unlike the ipykernel method.

conda install nb_conda_kernels

custom kernel lite – e.g. if you wish to run a kernel with different parameters. for exxample with a GPU-enabled launcher. See here for an example for GPU-enabled kernels:

For computers on linux with optimus, you have to make a kernel that will be called with optirun to be able to use GPU acceleration.

For me this was in fact primusrun.

I made a kernel in ~/.local/share/jupyter/kernels/dan/kernel.json and modified it thus:

{
"display_name": "dan-gpu",
"language": "python",
"argv": [
    "/usr/bin/primusrun",
    "/home/dan/.virtualenvs/dan/bin/python",
    "-m",
    "ipykernel_launcher",
    "-f",
    "{connection_file}"
]
}

I also wrote a wrapper script called primuslessrun which allows me to use CUDA virtualenvs but not the actual GPU, by setting an additional variable in the script:

CUDA_VISIBLE_DEVICES=

There is even a MATLAB bridge

Graphs

Set up inline plots:

%matplotlib inline

inline svg:

%config InlineBackend.figure_format = 'svg'

Graph sizes are controlled by matplotlib. Here's how to make big graphs:

import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (10.0, 8.0)

Interesting-looking other extensions:

Jupyter lab includes such nifty features as a diagram editor which you can install using jupyter labextension install jupyterlab-drawio

Exporting notebooks

You can host static versions easily using nbviewer (and github will do this automatically.)

For fancy variations you need to read how the document templates work/

Here is a base latex template for your academic use.

For very special occasions you can write your own or customize an existing exporter Once again, Julis Schulz has virtuosic tips, e.g. using cell metadata like this:

{
"caption": "somecaption",
"label": "fig:somelabel",
"widefigure": true
}

Presentations using Jupyter

The easiest is Classic reveal.js mode. tl;dr:

$ jupyter nbconvert --to slides some_notebook.ipynb  --post serve

You might want to amke verious improvements, such as tweaking the reveal.js settings in jupyter slideshows Fancier again: interactive slideshows using RISE. If you aren't running a coding class, you will want to hide the input cells from your IPython slides by customising the output templates.

Citations and other academic writing in Jupyter

tl;dr I did this for

  1. my blog – using simple Zotero markdown citation export, which is not great for inline citations but fine for bibliographies, and very easy and robust.

  2. my papers – abandinging jupyter in favour of Pweave+pandoc, which works amazingly for everything if you use pandoc tricks for your citations.

I couldn't find a unified approach for these two different use cases which didn't sound like more work than it was worth. At least, many academics seem to have way more tedious and error-prone workflows than this, so I'm just being a fancy pants if I try to tweak it further.

Pweave by Matti Pastell is a clone of knitr:

Pweave is a scientific report generator and a literate programming tool for Python. It can capture the results and plots from data analysis and works well with numpy, scipy and matplotlib.

Documented by Max Masnick. Whee, executable markdown pages.

Chris Sewell has produced a scripted called ipypublish that eases some of the pain points in producing articles. It's an impressive piece of work. (See the comments for some additional pro-tips for this.)

My own latex_fragment allows you to insert 1-off latex fragments into jupyter and pweave (e.g. algorithmic environments or some weird tikz thing.)

Jean-François Bercher's jupyter_latex_envs reimplements various latex markup as native jupyter including \cite. I

Sylvain Deville recommends treating jupyter as a glorified markdown editor and then using pandoc, which is an OK workflow if you are producing a once-off paper, but not for a repeatedly updated blog.

nbconvert has built-in citation support but only for LaTeX output. Citations look like this:

<cite data-cite="granger2013">(Granger, 2013)</cite>

or even

<strong data-cite="granger2013">(Granger, 2013)</strong>

The template defines the bibliography source and looks like:

((*- extends 'article.tplx' -*))

((* block bibliography *))j
((( super () )))
\bibliographystyle{unsrt}
\bibliography{refs}
((* endblock bibliography *))

And building looks like:

jupyter nbconvert --to latex --template=print.tplx mynotebook.ipynb

As above, it helps to know how the document templates work.

Note that even in the best case you don't have access to natbib-style citation, so auto-year citation styles will look funky.

{% extends 'full.tpl'%}
{% block any_cell %}
    <div style="border:thin solid red">
        {{ super() }}
    </div>
{% endblock any_cell %}

Julius Schulz gives a comprehensive config for this and everything else.

This workflow is smooth for directed citing, but note that there is no way to include a bibliography except by citation, so you have to namecheck every article; and the citation keys it uses are zotero citation keys which are nothing like your bibtex keys so can't really be manually edited.

if you are customising the output of jupyter's nbconvert, you should be aware that the {% block output_prompt %} override doesn't actually do anything in the templates I use. (Slides, HTML, LaTeX). Instead you need to use a config option:

$ jupyter nbconvert --to slides some_notebook.ipynb \
   --TemplateExporter.exclude_output_prompt=True \
    --post serve
I had to [use the source](https://github.com/jupyter/nbconvert/blob/db3036303237d45db9886c44e31132f90ef8d653/nbconvert/templates/html/basic.tpl)
to discover this.

ipyBibtex.ipynb? Looks like this:

%%cite
Lorem ipsum dolor sit amet
__\citep{hansen1982,crealkoopmanlucas2013}__,
consectetuer adipiscing elit,
sed diam nonummy nibh euismod tincidunt
ut laoreet dolore magna aliquam erat volutpat.

So it supports natbib-style author-year citations! But it's a small, unmaintained package so is risky.

TODO: Work out how Mark Masden got citations working?

Interactive visualisations/simulations etc

Jupyter allows interactions! This is the easiest python UI system I have seen, for all that it is basic.

Official Manual: ipywidgets.

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension

See also the announcement: Jupyter supports interactive JS widgets, where they discuss the data binding module in terms of javascript UI thingies.

Pro tip: If you want a list of widgets

from ipywidgets import widget
widget.Widget.widget_types

External event loops

External event loops are now easy and documented. What they don't say outright is that if you want to use the tornado event loop, relax because both the jupyter server and the ipython kernel already use the pyzmq event loop which subclasses the tornado one.

If you want you make this work smoothly without messing around with passing ioloops everywhere, you should make zmq install itself as the default loop:

from zmq.eventloop import ioloop
ioloop.install()

Now, your asynchronous python should just work using tornado coroutines.

NB with the release of latest asyncio and tornado and various major version incompatibilities, I'm curious how smoothly this all still works.

Javascript from python with jupyter

As seen in art python.

Here's how you invoke javascript from jupyter. Here is the jupyter JS source And here is the full jupyter browser JS manual, and the Jupyter JS extension guide.

Hosting live jupyter notebooks on the internet

Jupyter can host online notebooks, even multi-user notebook servers - if you are brave enough to let people execute weird code on your machine.

Commercial notebook hosts

NB: This section is outdated. TBD; I should probably mention the ill-explained Kaggle kernels and google cloud ML execution of same, etc.

Miscellaneous tips and gotchas

Debugging

This is all build on ipython so you invoke the debugger ipython-style, specifically:

from IPython.core.debugger import Tracer; Tracer()()      # < 5.1
from IPython.core.debugger import set_trace; set_trace()  # >= v5.1

IOPub data rate exceeded.

You got this error and you weren't doing anything that bandwidth intensive? Say, you were just viewing a big image, not a zillion images? It's jupyter being conservative in version 5.0

jupyter notebook --generate-config
atom ~/.jupyter/jupyter_notebook_config.py

update the c.NotebookApp.iopub_data_rate_limit to be big, e.g. c.NotebookApp.iopub_data_rate_limit = 10000000.

This is fixed after 5.0.

Offline mathjax in jupyter

python -m IPython.external.mathjax /path/to/source/mathjax.zip