The Living Thing / Notebooks :

Julia

The hippest way to get your IEEE754 on

Julia: A JIT-compiled language with especial focus on high performance scientific computation.

Why

tldr No magic bullet, but a handy arrow for the quiver.

Some of Julia’s community makes ambitious claims about julia being the fastest and bestest thing ever.

Unsurprisingly, julia is no panacea for programming ills, nor will it be. It is well designed for numerical computation. In my non-rigorous experiments it seems to do better than other scripting+compilation options, such as cython, on certain tasks. In particular, if you have dynamically defined code in the inner loop it does very well e.g you are doing a Monte Carlo simulation, but don’t know the density ahead of time. this is more or less what you’d expect from doing compilation as late as possible rather than shipping a compiled library, but it (usually) involves less messing around with compilation toolchains, platform libraries, ABIs, makefiles etc. OTOH it uses lots of memory, so don’t be running this on your raspberry pi. (UPDATE: my colleague Rowan assures me he runs julia on Raspberry Pi all the time, so maybe I need to tighten up my code.)

It has its own friction points. The community process is problematic, (see also). The debugging options are not impressive, and library support is patchy, especially at this early stage.

That said, the idea of a science-users-first JIT language is timely, and Julia is that. Python, for example, has clunky legacy issues in the numeric code and a patchy API, and is ill-designed for jit-compilation.. Matlab is expensive and nasty for non-numerics. Lua has some good science libraries and could likely have filled this niche but for AFAICT sociological reasons has not acquired the hipness or critical mass of Julia.

The language

Intros

In order of increasing depth

A Mental Model for Julia: Talking to a Scientist

See also Here’s something that wasn’t obvious to me: What are Symbols?

Pitfalls, gotcha

Chris Rackauckas mentions 7 Julia gotchas.

Here are some more.

Implementing methods for custom types is clunky

You want to implement a standard interface on your type so you can, e.g. iterate over it.

julia>  function Base.getindex(S::Squares, i::Int)
            1 <= i <= S.count || throw(BoundsError(S, i))
            return i*i
        end

julia>  Squares(100)[23]
529

If you are lucky you might be able to inherit from AbstractArray. However, this will require injecting your methods into another namespace; in this case, Base:

julia> struct SquaresVector <: AbstractArray{Int, 1}
            count::Int
        end
julia> Base.size(S::SquaresVector) = (S.count,)
julia> Base.IndexStyle(::Type{<:SquaresVector}) = IndexLinear()
julia> Base.getindex(S::SquaresVector, i::Int) = i*i

That might feel gross, and it certainly does lead to surprising behaviour that this is how things are done. Importing one package can magically change the behaviour of another. This “meta-side-effect” is everywhere in Julia, and is clearly marked when you write a package, but not when you use the package.

The type system is reasonably logical, although it’s not obvious if you are used to classical OOP. (This is not a criticism.)

It allocates for and copies array slices per default

You are using a chunk of an existing array and don’t want to copy? Consider using views for slices, they say, which means not using slice notation but rather the view function, or the handy @views macro.

Custom types and containers are scruffy

If you want fast numerics but you are not sure of your float precision, you should not use AbstractFloat for your ~~argument~~ type definitions ~~(although it works fine for simple types)~~. EDIT: You don’t need to worry about this in method definitions, as Chris Rackauckas points out.

If you do need well-typed containers, the idiomatic way to do this is using parametric types and parametric methods and so-called orthogonal design.

AFAICT a parametric declaration is the goods if, e.g. you want your method to work with an array of Float32, or of Float64, or of Float16, or of Decimal, but not an array which mixes those types. But also it should work on dense arrays, or view of dense arrays, or sparse arrays thereof, right? Then the correct type would look like one of these:

RealMatrixType = Union{AbstractMatrix{F},SparseMatrixCSC{F}} where {F<:Real}
RealVectorType = Union{AbstractVector{F},SparseVector{F}} where {F<:Real}

EDIT: you usually don’t need to bother with this kind of declaration; the compiler will work it dynamically out so long as the arguments’ types are sensible.

It’s unstable and hangs all the time

Yep.

Especially when it’s doing things that are supposed to be julia specialities, such as JIT-compiling dynamic inner functions. Use the recommended julia command:

killall -9 julia

FWIW this problem has mostly occurred for me using the JunoPro Intel MKL builds. Vanilla Juno is much more stable, and also smaller, and not noticeably slower on my particular code.

Workflow sucks

Yes.

You are using Julia because it is dynamic and because it is fast, but if you try to use code in a dynamic fashion, in the global namespace, it is no longer fast. Many complicated interactions ensue with the module system, and the recommended workarounds keep changing.

Julia 0.6 workflow tips, summarised for you:

Put code under development in a temporary module. Create a file, say Tmp.jl, and include within it

module Tmp
  <your definitions here>
end

Create another file, say tst.jl, which begins with import Tmp and includes tests for the contents of Tmp. reflect updates by using reload("Tmp").

reload("Tmp")
include("tst.jl")

What works more easily for me is running, in Juno or jupyter console:

@include("tmp.jl")

For Julia 0.7, you should instead use Revise.jl.

Argument syntax is only OK

Keyword arguments exist but do not participate in method dispatch. Basically, keyword arguments are second-class citizens and might make things slow or stupid if you need to specialise your code based on them. So… design your functions around that.

Going fast isn’t necessarily magical

Don’t believe that wild-eyed evangelical gentleman at the Julia meetup. He doesn’t write actual code; he’s too busy talking about it. You still need to optimize and hint.

Chris Rackauckas, in the comments of this blog, is at pains to point out that type-hinting is rarely required, even for much threaded vectorisation, which is amazing.

This doesn’t eliminate role that performance annotations and careful code structuring typically play, and I also do not regard it as a bug that you occasionally annotate code to inform the computer what you know about what can be done with it. Julia should not be spending CPU cycles proving non-trivial correctness results for you if you know them already. That would likely be solving NP-hard problems to speed up P problems, which is a bad bet.

Does this help?

Unsubstantiated rumour, (see Chris Rackauckas’ comments for additional substantiation) is that the following optimises system julia installation for local CPU:

include(
    joinpath(
        dirname(JULIA_HOME), "share", "julia", "build_sysimg.jl"
    )
)
build_sysimg(force=true)

IDEs/workbooks

There is a reasonable IDE called juno, built on atom. There is jupyter integration through IJulia.

Both these have their own annoyances. e.g. Juno is single-window only so you can’t use multiple monitors, and thus you end up squinting at tiny windows of code hidden between all the outputs. Atom’s panes aren’t well-designed for this use-case. For me at least there are a million tiny frictions distracting me from writing code.

If you install Juno as an app, but you also already use Jupyter, there is an additional annoyance in that it hijacks your atom install in a confusing way and mangles your various package preferences. If you love juno, I recommend installing it from within atom via the uber-juno package.

Possibly you can bypass this using homebrew? I didn’t try. But maybe give this a burl:

brew cask install juno

FWIW, I don’t think juno is worth the irritations, and never use it. IJulia and visual studio code together have a less annoying, zippier and more stable workflow for my style.

IJulia is also not flawless. It does its own overzealous installs per default, profligately installing another copy of jupyter, which you then have to maintain separately. Boring. You can bypass this by commanding it to use the perfectly good jupyter you already have installed:

ENV["JUPYTER"] = "/usr/local/bin/jupyter"
Pkg.add("IJulia")

Now IJulia appears as a normal kernel in your normal jupyter setup.

There is a package called Weave.jl which is inspired by R‘s knitr but compatible with jupyter, which could probably be used to fashion a working academic paper if you used the right pandoc hacks.

Pkg.add("Weave")

API, FFIs

See the API list

C

Sort of easy, but there is a tedious need to define the call signature at call time. Survivable.

R

XRJulia:

This package provides an interface from R to Julia, based on the XR structure, as implemented in the XR package, in this repository.

Rjulia:

rjulia provides an interface between R and Julia. It allows a user to run a script in Julia from R, and maps objects between the two languages.

Python

PyCall.js invokes python. Per default it installs yet another conda python, via Conda.jl, and defaults to the elderly python 2.7. This is irritating for avrious reasons, such as if you wish to install large python librariers.2

Here is one workaround to use an existing version

ENV["PYTHON"] = /usr/local/bin/python3
Pkg.build("PyCall")

Here is how you use Conda, but with python 3:

ENV["CONDA_JL_VERSION"] = "3"
Pkg.build("Conda")

Here is how you use an existing environment

conda create -n conda_jl python
export CONDA_JL_HOME="/path/to/miniconda/envs/conda_jl"
julia -e 'Pkg.build("Conda")'

Toolkits

Data loading/saving/exchange

The names are nearly all self explaining

Debugging and coding

Debugger, Gallium.jl:

This is the julia debugger. Please note that the 0.6 version of this package currently does not support breakpointing, C/C++ debugging or native code inspection. These features are being rebuilt, but were never particularly reliable in prior versions of this package and a cause of instability for the more mature features. In exchange, this package features a significantly more robust pure julia debug prompt, provided by ASTInterpreter2. Please file interpreter issues against that package.

Linter, Lint.jl (also has an atom linter plugin):

Signal processing, esp audio

Numerical tours of signal processing in Julia

JuliaAudio processes audio. They recommend PortAudio.jl as a realtime soundcard interface, which looks sorta simple. See rkat’s example of how this works. They have nice abstractions like SampledSignals. This stuff is used, at, e.g. Dolby for algorithms prototyping.

Images.jl processes images.

Statistics

DataFrames are provided by DataFrames.jl, and also DataTables.jl. The two are subtly incompatible in completely boring ways which you can hopefully ignore soon. For now, use IterableTables.jl to translate where needed. Blech. One can access DataFrames (And DataTables and SQL databases and streaming data sources) using Query.jl. You can load a lot of the r standard datasets using RDatasets.

using RDatasets
iris = dataset("datasets", "iris")
neuro = dataset("boot", "neuro")

Another notable product of JuliaStats organisation is Distributions.jl, a probability distribution toolkit providing densities, sampling etc.

Turing.jl does posterior inference.

Turing.jl is a Julia library for (universal) probabilistic programming. Current features include:

Possibly it is a competitor of Klara.jl, the Juliastats MCMC.

Lasso and other sparse regressions are available in Lasso.jl which reimplements the lasso algorithm in pure julia, GLMNET.jl which wrap the classic Friedman FORTAN code for same. There is also (functionality unattested) an orthogonal matching pursuit one called OMP.jl but that algorithm is simple enough to bang out yourself in an afternoon, so no stress if it doesn’t work.

Plotting

The julia wiki includes worked examples using various engines.

Gadfly

The aspirational ggplot clone is Gadfly. It’s elegant, well integrated into the statistics DataFrame ecosystem, but missing some stuff, and has some weird gotchas.

One of them is that the plots it produces by default are staggeringly gigantic. They will fill your hard drive and your github repositories with gigabytes of interactive gunk that you may never notice exists and probably don’t want to use, and definitely don’t want to keep. You avoid this by not just plotting, by plotting to, e.g., PNG:

First you will need to install the Fontconfig and Cairo bonus libraries

Pkg.add("Gadfly")
Pkg.add("Cairo")
Pkg.add("Fontconfig")

Now you can force PNG output:

draw(
  PNG(150, 150),
  plot(data,  x=:gamma, y=:p, Geom.point, Geom.line)
)

For me this results in a 40-fold savings factor in storage, and stops my github account from going instantly over quota.

Gadfly is based on a clever declarative vector graphics system called Compose.jl, which is independently useful.

Plots.jl

Plots.jl wraps many other plotting packages as backends, although notably not Gadfly. The author explains this as being because Gadfly does not support 3d or interactivity, although since I want neither of those things in general, especially for publication, this is a contraindication for the compatibilities of our mutual priorities. I have enough grief trying to persuade my department that this internet thing is anything other than a PDF download engine.

Plots is claims convenience for its fast plots of big things because it wraps GR.jl which in turn wraps GR, a cross-platform visualisation framework:

GR is a universal framework for cross-platform visualization applications. It offers developers a compact, portable and consistent graphics library for their programs. Applications range from publication quality 2D graphs to the representation of complex 3D scenes.[…]

GR is essentially based on an implementation of a Graphical Kernel System (GKS) and OpenGL. […] GR is characterized by its high interoperability and can be used with modern web technologies and mobile devices. The GR framework is especially suitable for real-time environments.

Vega

Vega.jl binds to a javascript ggplot-alike, vega. There is a “higher level” binding also called VegaLite.jl.

It’s unclear to me how either of these work with print graphics, and they are both lagging behind the latest version of the underlying Vega library so I’m nervous of them.

Others

Winston.jl has a brutalist simple approach but seems to go.

Differentiating, optimisation

Laplacians.jl by Dan Spielman et al is an advanced matrix factorisation toolkit.

The deep learning toolkits are, for the moment, lacking features. Perhaps they’ll get there?

tensorflow.jl. (Surely one misses the benefit of Julia by using tensorflow, since there are two different array-processing infrastructures to pass between, and a very different approach to JIT versus pre-compiled execution. Or is it a match made in heaven?)

Flux.jl sounds like a reimplementation of tensorflow-style differentiable programming inside Julia, which strikes me as the way you’d actually do this right, given the end-to-end-optimised design philosophy of julia.

Flux is a library for machine learning. It comes “batteries-included” with many useful tools built in, but also lets you use the full power of the Julia language where you need it. The whole stack is implemented in clean Julia code (right down to the GPU kernels) and any part can be tweaked to your liking.

It’s missing many features of tensorflow, but has nice dynamism like pytorch. Its GPU support is real, and elegant, but more proof-of-concept than production-ready, depending on hoping that cuarrays can represent all the operations you need.

Alternatively, Mocha.jl is a belt-and-braces deep learning thing, with a library of pre-defined layers. Swap out some of the buzzwords of Flux with newer ones, and skip some older ones, and there you are. However, it doesn’t appear to be nearly as flexible, missing state filters and recurrent nets and many other models that are made less holy by their distance from the platonic ideal of deep learning, the convnet, and therefore it is not much use to me, despite promising clever things about using CUDA etc. YMMV.

Knet.jl is another deeplearning library that claims to show the ease of implementing deep learning frameworks. Their RNN support looks a little limp, and they are using the less-popular autodiff frameworks, so I won’t be using ‘em but point is well taken.

The juliadiff project produces ForwardDiff.jl and ReverseDiff.jl which do what you would expect, namely autodiff. See also XGrad.

JuMP support many types of optimisation, including over non-continuous domains, and is part of the JuliaOpt family of confusingly diverse optimizers, whcih invoke veraious ub-families of optimizers. The famous NLOpt solvers comprise one such class, and they can additionally be invoked separately.

Unlike NLPT and the JuMP family, Optim (part of JuliaNLSolvers) solves optimisation problems purely inside Julia. It will automatically invoke ForwardDiff. Assumes mostly unconstrained problems.

Approximating

ApproxFun.jl does Chebychev and Fourier interpolations.

UIs and servers

HttpServer does basic http protocol serving; this is made modular and composable by Mux.jl. Fancy caching and templating etc come from Genie.jl.

Escher.jl goes further, rendering HTML UI widgets etc.

Various other options are listed in aviks’ stackoverflow answer.