# Jupyter

### The least excruciating compromise between 1) irreproducible science, and 2) spooking your colleagues with something too scarily new-fangled

The python-derived entrant in the scientific workbook field is called jupyter.

Works with python/julia/R/various. Jupyter allows easy(ish) online-publication-friendly worksheets, which are both interactive and easy to export for static online use. This is handy. So handy that it's sometimes worth the many agonising broken things that you encounter while trying to benefit from this.

To install it, see the jupyter homepage.

## Jupyter considered harmless

I'm not a massive fan of the jupyter notebook/lab interface, which I consider to be a kind of exploration of the pathology of browser-based clients. It has some nifty elegant stuff, which is obscured behind a lot of bolted-on cruft, and the result is not a harmonious scientific computation workflow. But it is easy to get started with if you want to graphically explore what you just did. This at least is a win.

Some argue that the constraints of jupyter can lead to good architecture, such as Guillaume Chevallier and Jeremy Howard. This sounds like an interactive twist on the old test-driven-development rhetoric.

## stripping extra crap

You can strip images and other big things from your notebook to keep version control tidy. See how fastai does this with automated github hooks.

### Diffing/merging

Diffing and merging is painful in jupyter. Workaround: nbdime provides diffing and merging for notebooks. It has git integration:

nbdime config-git --enable --global


## Frontends

Jupyter is a whole ecology of different language back end kernels talking to various front end executors

• Classic jupyter notebook, a browser-based coding environment. The command jupyter notebook starts this mode.

• jupyterlab, the new thing, extends and redesigns the classic notebook into an IDE with text editors, notebooks, REPL terminals etc. The command jupyter lab starts this mode.

• base ipython shell

• hydrogen, a plugin for the atom text editor, providing a more unified coding experience from a normal code editor. (intro blog post) IMO, this kind of thing is a generally better way of doing it. Jupyter shouldn't have to reinvent text editors. Although they won't be prevented from that, no matter my opinions.

• vscodeJupyter is hydrogen for VS Code

• nteract, a desktop app for running jupyter notebooks as apps, integrating with OS indexing services and looking pretty etc. Not totally sold on this idea because it looks so bloaty, but I could be persuaded.

• pweave, below, also executes jupyter kernels

• Here is some good verbiage. Will Chrichton, The Future of Notebooks: Lessons from JupyterCon

At JupyterCon, I learned three things: reactive notebooks are the future, Jupyter is the new Bash, and data science is a gateway drug.

### Notebook classic

#### Configuring

the location of themeing, widgets, CSS etc has moved of late; check your version number. The current location is ~/.jupyter/custom/custom.css, not the former location ~/.ipython/profile_default/static/custom/custom.css

Julius Schulz's ultimate setup guide is also the ultimate pro tip compilation.

#### Auto-closing parentheses gives you cancer

First kill its parenthesis molestation function with fire. Unless you like having to fight with your IDE's assinine faith in its ability to read your mind. The setting is tricky to find, because it is not called “put syntax errors in my code without me asking Y/N”, but instead cm_config.autoCloseBrackets. According to a support ticket this should work.

# Run this in Python once, it should take effect permanently
from notebook.services.config import ConfigManager
c = ConfigManager()
c.update('notebook', {"CodeCell": {"cm_config": {"autoCloseBrackets": False}}})


define([
'base/js/namespace',
], function(Jupyter) {
Jupyter.CodeCell.options_default.cm_config.autoCloseBrackets = false;
})


(That doesn't work with jupyterlab, which would instead like you to go fuck yourself. Just wait for the syntax errors.)

#### Notebook extensions

Jupyter classic is more usable if you install the notebook extensions, which includes, e.g. drag-and-drop image support.

$pip install --upgrade jupyter_contrib_nbextensions$ jupyter contrib nbextension install --user


For example, if you run nbconvert to generate a HTML file, this image will remain outside of the html file. You can embed all images by using the calling nbconvert with the EmbedPostProcessor.

$jupyter nbconvert --post=embed.EmbedPostProcessor  Update – broken in Jupyter 5.0 Wait, that was still pretty confusing; I need the notebook configurator whatsit. $ pip install --upgrade jupyter_nbextensions_configurator
$jupyter nbextensions_configurator enable --user  ### Jupyter lab jupyter lab is the current cutting edge, and reputedly is much nicer to develop plugins for than the notebook interface. From the user perspective it's more or less the same thing, but the annoyances are different. It does not strictly dominate notebook in terms of user experience, however, even if it does in terms of experience for plugin developers. The UI, though… The mysterious curse of javascript development is that once you have tasted it, you are unable to resist an uncontrollable urge to reimplement something that already worked as a crappier javascript version. These folks reimplement copy, paste, search/replace, browser tabs and the command line. The replacement versions run in parallel to the existing versions, with clashing keyboard shortcuts and confusingly similar function Because I am used to how all these functions work in the browser, it would have be to an astonishingly large improvement in each of them to be worth my time learning the new jupyterlab system, which after all, I am not using for its quirky alternate take on tabs, cut-and-paste etc, but because I want a quick interface to run some shareable code with embedded code and graphics. Needless to say, large UX improvements are not delivered, but rather, we get some unintuitive trade-offs like a search function which non-deterministically sometimes does regexp matching but then doesn't search the whole page. Or something? Some jupyterlab enthusiasts want to re-implement text editors too. Much artisinal hand made crafts! Whether you like the overwrought jupyter lab UX or not, we should all live with whatever NIH it has, if the developer API is truly cleaner and and easier to work with. That would be a solid win in terms of delivering the interactive coding features I would actually regard as improvements. In the meantime, I will withstand fussy search-and-replace and tabs-within-tabs etc. Personal peeve: As presaged, jupyter lab molests brackets, and has made that particular form of syntax error introduction compulsory as a test of your commitment. Life is easier with jupyerlab-toc which allows you to navigate your lab notebook by markdown section headings. jupyter labextension install @jupyterlab/toc  Integrated diagram editor? Someone integrated drawio as jupyterlab-drawio to prove a point about the developer API thing. jupyter labextension install jupyterlab-drawio  latex editor? As flagged, I think this is a terrible idea. There are better editors than jupyter, better means of scientific communication than latex, and better specific latex tooling, but I will concede there is some kind of situation where this sweet spot of mediocrity might be useful, if only as a plot point in the kind of highly contrived techno-thriller script written by cloistered nerds. jupyter labextension install @jupyterlab/latex  ## Rich display Various objects support rich display of python objects e.g. IPython.display.Image from IPython.display import Image Image(filename='img/test.png')  or you can use markdown for local image display ![title](img/test.png)  If you want to make your own objects display, uh, richly, you can implement the appropriate magical methods: class Shout(object): def __init__(self, text): self.text = text def _repr_html_(self): return "<h1>" + self.text + "</h1>"  I leverage this to make a latex renderer called latex_fragment which you should totally check out for rendering inline algorithms, or for emitting SVG equations. ## Custom kernels jupyter looks for kernel specs in a kernel spec directory, depending on your platform. Say your kernel is dan: • Unixey: ~/.local/share/jupyter/kernels/dan/kernel.json • OSX: ~/Library/Jupyter/kernels/dan/kernel.json • Win: %APPDATA%\jupyter\kernels\dan\kernel.json See the manual. tl;dr Do this from inside the virtualenv to bootstrap it: pip install ipykernel python -m ipykernel install --user --name=my-virtualenv-name  Addendum: for Anaconda, you can auto-install all conda envs, which worked for me, unlike the ipykernel method. conda install nb_conda_kernels  custom kernel lite – e.g. if you wish to run a kernel with different parameters. for exxample with a GPU-enabled launcher. See here for an example for GPU-enabled kernels: For computers on linux with optimus, you have to make a kernel that will be called with optirun to be able to use GPU acceleration. For me this was in fact primusrun. I made a kernel in ~/.local/share/jupyter/kernels/dan/kernel.json and modified it thus: { "display_name": "dan-gpu", "language": "python", "argv": [ "/usr/bin/primusrun", "/home/dan/.virtualenvs/dan/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}" ] }  I also wrote a wrapper script called primuslessrun which allows me to use CUDA virtualenvs but not the actual GPU, by setting an additional variable in the script: CUDA_VISIBLE_DEVICES=  There is even a MATLAB bridge ## Graphs Set up inline plots: %matplotlib inline  inline svg: %config InlineBackend.figure_format = 'svg'  Graph sizes are controlled by matplotlib. Here's how to make big graphs: import matplotlib as mpl mpl.rcParams['figure.figsize'] = (10.0, 8.0)  Interesting-looking other extensions: Jupyter lab includes such nifty features as a diagram editor which you can install using jupyter labextension install jupyterlab-drawio ## Exporting notebooks You can host static versions easily using nbviewer (and github will do this automatically.) For fancy variations you need to read how the document templates work/ Here is a base latex template for your academic use. For very special occasions you can write your own or customize an existing exporter Once again, Julis Schulz has virtuosic tips, e.g. using cell metadata like this: { "caption": "somecaption", "label": "fig:somelabel", "widefigure": true }  ### Presentations using Jupyter The easiest is Classic reveal.js mode. tl;dr: $ jupyter nbconvert --to slides some_notebook.ipynb  --post serve


You might want to amke verious improvements, such as tweaking the reveal.js settings in jupyter slideshows Fancier again: interactive slideshows using RISE. If you aren't running a coding class, you will want to hide the input cells from your IPython slides by customising the output templates.

### Citations and other academic writing in Jupyter

tl;dr I did this for

1. my blog – using simple Zotero markdown citation export, which is not great for inline citations but fine for bibliographies, and very easy and robust.

2. my papers – abandinging jupyter in favour of Pweave+pandoc, which works amazingly for everything if you use pandoc tricks for your citations.

I couldn't find a unified approach for these two different use cases which didn't sound like more work than it was worth. At least, many academics seem to have way more tedious and error-prone workflows than this, so I'm just being a fancy pants if I try to tweak it further.

Pweave by Matti Pastell is a clone of knitr:

Pweave is a scientific report generator and a literate programming tool for Python. It can capture the results and plots from data analysis and works well with numpy, scipy and matplotlib.

Documented by Max Masnick. Whee, executable markdown pages.

Chris Sewell has produced a scripted called ipypublish that eases some of the pain points in producing articles. It's an impressive piece of work. (See the comments for some additional pro-tips for this.)

My own latex_fragment allows you to insert 1-off latex fragments into jupyter and pweave (e.g. algorithmic environments or some weird tikz thing.)

Jean-François Bercher's jupyter_latex_envs reimplements various latex markup as native jupyter including \cite. I

Sylvain Deville recommends treating jupyter as a glorified markdown editor and then using pandoc, which is an OK workflow if you are producing a once-off paper, but not for a repeatedly updated blog.

nbconvert has built-in citation support but only for LaTeX output. Citations look like this:

<cite data-cite="granger2013">(Granger, 2013)</cite>


or even

<strong data-cite="granger2013">(Granger, 2013)</strong>


The template defines the bibliography source and looks like:

((*- extends 'article.tplx' -*))

((* block bibliography *))j
((( super () )))
\bibliographystyle{unsrt}
\bibliography{refs}
((* endblock bibliography *))


And building looks like:

jupyter nbconvert --to latex --template=print.tplx mynotebook.ipynb


As above, it helps to know how the document templates work.

Note that even in the best case you don't have access to natbib-style citation, so auto-year citation styles will look funky.

{% extends 'full.tpl'%}
{% block any_cell %}
<div style="border:thin solid red">
{{ super() }}
</div>
{% endblock any_cell %}

• but how about for online? cite2c seems to do this by live inserting citations from zotero, including author-year stuff. (Requires Jupyter notebook 4.2 or better which might require a pip install --upgrade notebook)

This workflow is smooth for directed citing, but note that there is no way to include a bibliography except by citation, so you have to namecheck every article; and the citation keys it uses are zotero citation keys which are nothing like your bibtex keys so can't really be manually edited.

if you are customising the output of jupyter's nbconvert, you should be aware that the {% block output_prompt %} override doesn't actually do anything in the templates I use. (Slides, HTML, LaTeX). Instead you need to use a config option:

$jupyter nbconvert --to slides some_notebook.ipynb \ --TemplateExporter.exclude_output_prompt=True \ --post serve  I had to [use the source](https://github.com/jupyter/nbconvert/blob/db3036303237d45db9886c44e31132f90ef8d653/nbconvert/templates/html/basic.tpl) to discover this.  ipyBibtex.ipynb? Looks like this: %%cite Lorem ipsum dolor sit amet __\citep{hansen1982,crealkoopmanlucas2013}__, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat.  So it supports natbib-style author-year citations! But it's a small, unmaintained package so is risky. TODO: Work out how Mark Masden got citations working? ## Interactive visualisations/simulations etc Jupyter allows interactions! This is the easiest python UI system I have seen, for all that it is basic. Official Manual: ipywidgets. pip install ipywidgets jupyter nbextension enable --py widgetsnbextension  See also the announcement: Jupyter supports interactive JS widgets, where they discuss the data binding module in terms of javascript UI thingies. Pro tip: If you want a list of widgets from ipywidgets import widget widget.Widget.widget_types  ### External event loops External event loops are now easy and documented. What they don't say outright is that if you want to use the tornado event loop, relax because both the jupyter server and the ipython kernel already use the pyzmq event loop which subclasses the tornado one. If you want you make this work smoothly without messing around with passing ioloops everywhere, you should make zmq install itself as the default loop: from zmq.eventloop import ioloop ioloop.install()  Now, your asynchronous python should just work using tornado coroutines. NB with the release of latest asyncio and tornado and various major version incompatibilities, I'm curious how smoothly this all still works. ### Javascript from python with jupyter As seen in art python. ## Hosting live jupyter notebooks on the internet Jupyter can host online notebooks, even multi-user notebook servers - if you are brave enough to let people execute weird code on your machine. ### Commercial notebook hosts NB: This section is outdated. TBD; I should probably mention the ill-explained Kaggle kernels and google cloud ML execution of same, etc. • Here's an example of how you would get live (dynamic) ones running on Amazon for free or cheap • sagemath runs notebooks online, with fancy features starting at$7/month. Messy design but tidy open-source ideals.

• Anaconda.org appears to be a python package development service, but they also have a sideline in hosting notebooks. (\$7/month) Requires you to use their anaconda python distribution tools to work, which is… a plus and a minus. The anaconda python distro is simple for scientific computing, but if your hard disk is as full of python distros as mine is you tend not to want more confusing things and wasting disk space.

## Miscellaneous tips and gotchas

### Debugging

This is all build on ipython so you invoke the debugger ipython-style, specifically:

from IPython.core.debugger import Tracer; Tracer()()      # < 5.1
from IPython.core.debugger import set_trace; set_trace()  # >= v5.1


### IOPub data rate exceeded.

You got this error and you weren't doing anything that bandwidth intensive? Say, you were just viewing a big image, not a zillion images? It's jupyter being conservative in version 5.0

jupyter notebook --generate-config
atom ~/.jupyter/jupyter_notebook_config.py


update the c.NotebookApp.iopub_data_rate_limit to be big, e.g. c.NotebookApp.iopub_data_rate_limit = 10000000.

This is fixed after 5.0.

### Offline mathjax in jupyter

python -m IPython.external.mathjax /path/to/source/mathjax.zip