ipython, the interactive python upgrade
The python-specific part of jupyter, which can also run without jupyter. Long story.
The main thing I forget here is
Pretty display of objects
Check out the Rich display protocol which allows you to render objects as arbitrary graphics.
How to use this? The display api docs explain that you should basically implement methods such as _repr_svg_.
I made a thing called latex_fragment which leverages this to display arbitrary latex inline. This is how you do it.
def _figure_data(self, format): fig, ax = plt.subplots() ax.plot(self.data, 'o') ax.set_title(self._repr_latex_()) data = print_figure(fig, format) # We MUST close the figure, otherwise IPython's display machinery # will pick it up and send it as output, resulting in a double display plt.close(fig) return data # Here we define the special repr methods that provide the IPython display protocol # Note that for the two figures, we cache the figure data once computed. def _repr_png_(self): if self._png_data is None: self._png_data = self._figure_data('png') return self._png_data
For a non-graphical non-fancy terminal, you probably simply want nice formatting of dictionaries etc:
from pprint import pprint, pformat pprint(obj) # display it print(pformat(obj)) # get a nicely formatted representation
Wait, you want to write your own pretty-printer, with correct indentation? Use tiles.
Reloading edited code
Sometimes it’s complicated to work out how to load some complicated dependency tree of stuff. There is an autoreload extension which in principle reloads everything that has changed.
%load_ext autoreload %autoreload 2
If you don’t trust it do it manually. Use deepreload. You can even hack traditional reload to be deep.
import builtins from IPython.lib import deepreload builtins.reload = deepreload.reload
That didn’t work reliably for me. If you load them both at the same time, stuff gets weird. Don’t do that.
Also, this is incompatible with snakeviz. Errors ensue.
how to start the basic interactive debugger
Let’s say there is a line in your code that fails:
In vanilla python if you want to debug the last exception (the post-mortem debugger) you do:
import pdb; pdb.pm()
and if you want to drop into a debugger from some bit of code, you write:
import pdb; pdb.set_trace()
and if you want to use a fancier debugger (ipdb is recommended):
import ipdb; ipdb.set_trace()
import ipdb; ipdb.pm()
This doesn’t work in jupyter/ipython, which has some other fancy interaction loop going on.
from IPython.core.debugger import Tracer; Tracer()() # < 5.1 from IPython.core.debugger import set_trace; set_trace() # >= v5.1
%debug [--breakpoint filename:line_number_for_breakpoint]
Without the argument it activates post-mortem mode. Pish posh, who thinks in line-numbers? set_trace wastes less time for humans per default.
And if you want to drop automatically into the post mortem debugger for every error:
%pdb on 1/0
Gaël recommended some extra debuggers:
- aiomonitor is REPL-injection for async python
- pudb, a curses-style debugger, is very popular.
- The trepan family of debuggers, trepan3k (python 3), trepan (python 2), ipython-trepan (theoretically ipython but looks unmaintained). Docs live here.
- My brother Andy likes the intellij IDE’s built-in python debugger.
- There are many other debuggers.
- That’s too many debuggers
- Realistically I won’t use any of them, because the inbuilt one is OK.
Useful debugger commands
- ! statement
- Execute the (one-line) statement in the context of the current stack frame, even if it mirrors the name of a debugger command This is the most useful command, because the debugger parser is horrible and will always interpret anything it conceivable can as a debugger command instead of a python commmand, whcih is confusing and misleading. So sut preface everythign with ! to be safe.
- h(elp) [command]
- Print your location in current stack
- d(own) [count]/up [count]
- Move the current frame count (default one) levels down/ in the stack trace (to a newer frame).
- b(reak) [([filename:]lineno | function) [, condition]]
- The one that is tedious to do manually. Without argument, list all breaks and their metadata.
- tbreak [([filename:]lineno | function) [, condition]]
- Temporary breakpoint, which is removed automatically when it is first hit.
- cl(ear) [filename:lineno | bpnumber [bpnumber ...]]
- Clear specific or all breakpoints
- disable [bpnumber [bpnumber ...]]/enable [bpnumber [bpnumber ...]]
- disable is the same as clear, but you can re-enable
- ignore bpnumber [count]
- ignore a breakpoint a specified number of times
- condition bpnumber [condition]
- Set a new condition for the breakpoint
- commands [bpnumber]
- Specify a list of commands for breakpoint number bpnumber. The commands themselves appear on the following lines. Type end to terminate the command list.
- Execute the next line, even if that is inside an invoked function.
- Execute the next line in this function.
- unt(il) [lineno]
- continue to line lineno, or the next line with a highetr number than the current one
- Continue execution until the current function returns.
- Continue execution, only stop when a breakpoint is encountered.
- j(ump) lineno
- Set the next line that will be executed. Only available in the bottom-most frame. It is not possible to jump into weird places like the middle of a for loop.
- l(ist) [first[, last]]
- List source code for the current file.
- ll | longlist
- List all source code for the current function or frame.
- Print the argument list of the current function.
- p expression
- Evaluate the expression in the current context and print its value.
- pp expression
- Like the p command, except the value of the expression is pretty-printed using the pprint module.
- whatis expression
- Print the type of the expression.
- source expression
- Try to get source code for the given object and display it.
- display [expression]/undisplay [expression]
- Display the value of the expression if it changed, each time execution stops in the current frame.
- Start an interactive interpreter (using the code module) whose global namespace contains all the (global and local) names found in the current scope.
- alias [name [command]]/unalias name
Create an alias called name that executes command.
As an example, here are two useful aliases from the manual, for the .pdbrc file:
# Print instance variables (usage ``pi classInst``) alias pi for k in %1.__dict__.keys(): print("%1.",k,"=",%1.__dict__[k]) # Print instance variables in self alias ps pi self
- Pack up and go home
Python 3 has tracemalloc built in. this is a powerful python memory analyser, although bare-bones. Mike Lin walks you though it. Benoit Bernard explains various options that run on older pythons, including, most usefully IMO, obgraph which draws you an actual diagram of where the leaking things are. More full features, Pympler provide GUI-backed memory profiling, including the magically handy thing of tracking referrers using its refbrowser.
pyrasite injects code into running python processes, which enables more exotic debuggery, and realtime object mutation and stuff and of course, memory and performance profiling.
Maybe it’s not crashing, but taking too long? You want a profiler.
Easy mode: built-in profiler
Profile functions using cProfile:
import cProfile as profile profile.runctx('print(predded.shape)', globals(), locals())
There are also memory allocation tools, although I’ve not used them and suspect they are no longer current.
Now visualise them using… uh… let me come back to that.
[…] lets you visualize what your Python program is spending time on without restarting the program or modifying the code in any way. Py-Spy is extremely low overhead: it is written in Rust for speed and doesn’t run in the same process as the profiled Python program, nor does it interrupt the running program in any way. This means Py-Spy is safe to use against production Python code.[…]
This project aims to let you profile and debug any running Python program, even if the program is serving production traffic.[…]
Py-spy works by directly reading the memory of the python program using the process_vm_readv system call on Linux, the vm_read call on OSX or the ReadProcessMemory call on Windows.
Figuring out the call stack of the Python program is done by looking at the global PyInterpreterState variable to get all the Python threads running in the interpreter, and then iterating over each PyFrameObject in each thread to get the call stack.
Native ipython can run profiler magically:
%%prun -D somefile.prof files = glob.glob('*.txt') for file in files: with open(file) as f: print(hashlib.md5(f.read().encode('utf-8')).hexdigest())
snakeviz includes a handy magic to automatically save stats and launch the profiler. (Gotcha: you have to have the snakeviz cli already on the path when you launch ipython.)
%load_ext snakeviz %%snakeviz files = glob.glob('*.txt') for file in files: with open(file) as f: print(hashlib.md5(f.read().encode('utf-8')).hexdigest())
This is incompatible with autoreload and gives weird errors if you run them both in the same session.
Too many bike sheds.
Jacon Kaplan-Moss likes pytest and he’s good let’s copy him. FWIW I’m no fan of nose; my experience of it was that I spent a lot of time debugging weird failures getting lost in its attempts to automagically help me. This might be because I didn’t deeply understand what i was doing, but the other frameworks didn’t require me to understand that deeply the complexities of their attempts to simplify my life.
is a library which does randomised constraint-based testing:
It works by generating random data matching your specification and checking that your guarantee still holds in that case. If it finds an example where it doesn’t, it takes that example and cuts it down to size, simplifying it until it finds a much smaller example that still causes the problem.