The Living Thing / Notebooks :

Highly performative computing

In soviet Russia, job schedules YOU

Your computer uses some pre-cloud job manager such as (primordial) Platform LSF/Torque or (recentish) slurm. You don’t care about the details of this, since you have a vague suspicion that the entire world of old-school HPC is asymptotically approaching a 0% market share. All you know is that your hip kubernetes VM-based solution leveraging Apache Spark and some other stuff you got off github, that will not run here in the manner Google intended. But your insouciance doesn’t solve your immediate problem: you need to work in this hostile environment now, because your department prefers to shovel weeks grad-student labour into the sunk-cost pit at the bottom of which lies the giant computing cluster they bought 5 years ago, to the horrifying prospect of giving you billing privileges for the cloud.

So how can you get access to those handy machines, while learning the absolute minimum that you possibly can about anything to do with their magnificently baroque, obsolescent architecture?

Job management

hanythingondemand provides a set of scripts to easily set up an ad-hoc Hadoop cluster through PBS jobs.

Holy crap, someone procrastinated. Andrea Zonca wrote a script that allows spawning jobs on your legacy HPC monstrosity. It’s called remotespawner. This is the most thankless kind of task.

A more ad hoc, probably-slower-but-more-robust approach, which perhaps avoids the damn thing: Easily distributing a parallel IPython Notebook on a cluster:

Have you ever asked yourself: “Do I want to spend 2 days adjusting this analysis to run on the cluster and wait 2 days for the jobs to finish or do I just run it locally with no extra work and just wait a week.”

Why yes, I have.


“Quickly and easily parallelize Python functions using IPython on a cluster, supporting multiple schedulers. Optimizes IPython defaults to handle larger clusters and simultaneous processes.”[…]

ipython-cluster-helper creates a throwaway parallel IPython profile, launches a cluster and returns a view. On program exit it shuts the cluster down and deletes the throwaway profile.

works on on Platform LSF, Sun Grid Engine, Torque, SLURM.

I do not at all understand how you get data back from this; I guess you run it in situ. Strictly python.

Dependency management

Fact: you are running some ancient decrepit edition of RHEL with too elderly a version of anything to run anything you need. You will need to install everything you use. That is fine, but be aware it’s a little different than provisioning virtual machines on your new-fangled fancy cloud thingy. This is to teach you very important and hugely useful lessons for later in life, such as which compile flags to set to get the matrix kernel library version 8.7.3patch3 to compile with python 2.5.4rc3 for the itanium architecture as at May 8, 2009. Why, think on how many times you will use that skill after you leave your current job! (We call such contemplation void meditation.)

Here are some options to ease the installation process.

  1. Use Linuxbrew to install all deps into your home dir.

    Linuxbrew is a fork of Homebrew, the macOS package manager, for Linux.

    It can be installed in your home directory and does not require root access. The same package manager can be used on both your Linux server and your Mac laptop. Installing a modern version of glibc and gcc in your home directory on an old distribution of Linux takes five minutes.

  2. use spack, an HPC-specific build tool, which also lets you prototype on OSX.

    Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software easy. With Spack, you can build a package with multiple versions, configurations, platforms, and compilers, and all of these builds can coexist on the same machine.

    Spack isn’t tied to a particular language; you can build a software stack in Python or R, link to libraries written in C, C++, or Fortran, and easily swap compilers. Use Spack to install in your home directory, to manage shared installations and modules on a cluster, or to build combinatorial versions of software for testing.