The Living Thing / Notebooks : Python

A Swiss army knife of coding tools. Good matrix library, general scientific tools, statistics library, art tools interoperation with everything else - wraps C, C++, Fortran, comes with web servers, HTTP clients, parsers and all the other fruits of a thriving community. Fast enough, easy to debug, garbage-collected. If some bit is too slow, you compile it, otherwise, you relax. An excellent choice if you’d rather get stuff done than write code.

I do my stats and graphs in R, my user interface in javascript, my parallelism in java, and my linear algebra library is fortran but python is the thread that stitches this Frankensteinian monster together.

Of course, it could be better. Clojure is more elegant, scala is more parallelisable, julia prioritises scientific work more highly… But in terms of using a damn-well-supported language that goes on your computer right now, and requires you to reinvent few wheels, and which is transferrable across number crunching, web development, UIs, text processing, graphics and sundry other domains, and does not require heavy licensing costs… this one is a good default choice.

Python version management for weird sciency distributions

One suggestion I’ve has is to use pyenv.

apparently I should also use virtualenv, which can create different projects within a global python version.

In addition, anaconda reckons their conda command is the best.

Humph.

I’m using virtualenv for now; it is the most common one and works fine.

ipython

The python-specific part of jupyter, which can also run without jupyter. Long story.

The main problem I forget here is

how to debug.

Let’s say there is a line in your code that fails:

1/0

In vanilla python if you want to debug the last exception (the post-mortem debugger) you do:

import pdb; pdb.pm()

and if you want to drop into a debugger from some bit of code, you write:

import pdb; pdb.set_trace()

and if you want to use a fancier debugger (ipdb is recommended):

import ipdb; ipdb.set_trace()

or:

import ipdb; ipdb.pm()

This doesn’t work in jupyter, which has some other fancy interaction loop going on.

Here’s one manual way to drop into the debugger from code, noticed by Christoph Martin

from IPython.core.debugger import Tracer; Tracer()()
1/0

However, that’s not how you are supposed to do it. Persons of quality invoke their debuggers via so-called magics, e.g. the %debug magic to set a breakpoint.

%debug [--breakpoint filename:line_number_for_breakpoint]

Without the argument it activates post-mortem mode. Seriously though, who thinks in line-numbers? Tracer realistically wastes less time.

And if you want to drop automatically into the post mortem debugger for every error:

%pdb on

1/0

Props to Josh Devlin for explaining this and some other handy tips, and Gaël Varoquaux.

Gaël recommended some extra debuggers:

Pretty display of objects

Check out the ipython display protocol which allows you to render objects as arbitrary graphics:

def _figure_data(self, format):
       fig, ax = plt.subplots()
       ax.plot(self.data, 'o')
       ax.set_title(self._repr_latex_())
       data = print_figure(fig, format)
       # We MUST close the figure, otherwise IPython's display machinery
       # will pick it up and send it as output, resulting in a double display
       plt.close(fig)
       return data

   # Here we define the special repr methods that provide the IPython display protocol
   # Note that for the two figures, we cache the figure data once computed.

   def _repr_png_(self):
       if self._png_data is None:
           self._png_data = self._figure_data('png')
       return self._png_data

Profiling

Profile functions using cProfile.

Now visualise them using… uh…

Visualising profiles

Miscellaneous stuff I always need to look up

Packaging

Not so hard, but confusing and chaotic due to many long-running disputes only lately resolving.

Testing

Too many bike sheds.

More robust tests

Python 2 v 3

Typing

http://mypy-lang.org/

Misc recommendations