The Living Thing / Notebooks :

Julia, the programming language

The hippest way to get your IEEE754 on. Hngh.

Usefulness: 🔧 🔧 🔧
Novelty: 💡
Uncertainty: 🤪 🤪
Incompleteness: 🚧 🚧

Julia: A JIT-compiled language with especial focus on high performance scientific computation.

A Mental Model for Julia: Talking to a Scientist

—Chris Rackauckas, UCI Data Science Initiative Intro to Julia.

Why

tl;dr Not a magic bullet, a handy arrow for the quiver.

Some of Julia’s community makes ambitious claims about Julia being the fastest and bestest thing ever.

Unsurprisingly, Julia is no panacea. It is well designed for numerical computation. In my non-rigorous experiments it seems to do better than other scripting+compilation options, such as cython, on certain tasks. In particular, if you have dynamically defined code in the inner loop it does very well — say, you are doing a Monte Carlo simulation, but don’t know the users’s desired density ahead of time. This is more or less what you’d expect from doing compilation as late as possible rather than shipping a compiled library, but it (at least for my use case) involves less messing around with compilation tool-chains, platform libraries, ABIs, makefiles etc.

Julia has its own idiosyncratic frictions. The community process can be problematic (see also giving up on Julia). Library support is patchy, especially at this early stage. It doesn’t run on iOs. In fact it uses lots of memory, so maybe not ideal for embedded controller. Although my colleague Rowan assures me he runs serious Julia code on Raspberry Pi at least all the time, so maybe I need to tighten up my algorithms.

That said, the idea of a science-users-first JIT language is timely, and Julia is that. Python, for example, has clunky legacy issues in the numeric code and a patchy API, and is ill-designed for JIT-compilation, despite various projects that attempt to facilitate it. Matlab is expensive, and nasty for non-numerics, not to mention old and crufty, and code seems to constantly break between MATLAB versions at least as often as it does between Julia versions. Lua has some good science libraries and could likely have filled this niche but for reasons that are perhaps sociological as much as technical doesn’t have the hipness or critical mass of Julia.

The language

Intros

In order of increasing depth

Here’s something that wasn’t obvious to me: What are Symbols?

Exceptions, Errors

What are the Exception types?.

Pitfalls, gotchas

Chris Rackauckas mentions 7 Julia gotchas.

Here are some more.

Implementing methods for custom types is clunky

You want to implement a standard interface on your type so you can, e.g. iterate over it, which commonly looks like this:

for element in iterable
    # body
end

or equivalently

iter_result = iterate(iterable)
while iter_result !== nothing
    (element, state) = iter_result
    # body
    iter_result = iterate(iterable, state)
end

Here is a simple example of that: A range iterator which yields every nth element up to some number of elements could look like

julia> struct EveryNth
           n::Int
           start::Int
           length::Int
       end
julia> function Base.iterate(iter::EveryNth, state=(iter.start, 0))
           element, count = state

           if count >= iter.length
               return nothing
           end

           return (element, (element + iter.n, count + 1))
       end
julia> Base.length(iter::EveryNth) = iter.length
julia> Base.eltype(iter::EveryNth) = Int

(If you are lucky you might be able to inherit from AbstractArray.)

It’s weird for me that this requires injecting your methods into another namespace; in this case, Base. That might feel gross, and it does lead to surprising behaviour that this is how things are done, and some fairly mind-bending namespace resolution rules for methods. Importing one package can magically change the behaviour of another. This monkey patch style (called, in Julia argot, “type piracy”) is everywhere in Julia, and is clearly marked when you write a package, but not when you use the package. Anyway it works fine and I can’t imagine how to handle the multiple dispatch thing better, so deal with it.

The type system is logical, although it’s not obvious if you are used to classical OOP. (Not a criticism.)

It allocates for and copies array slices per default

You are using a chunk of an existing array and don’t want to copy? Consider using views for slices, they say, which means not using slice notation but rather the view function, or the @views macro. Both these are ugly in different ways so I cross my fingers and hope the compiler can optimise away some of this nonsense, but I am not going to check that for now.

Custom containers are scruffy

If you want fast numerics but you are not sure of your float precision, you should not use AbstractFloat for your argument type definitions (although it works fine for simple types). EDIT: You don’t generally need to worry about this in method definitions, as Chris Rackauckas points out.

If you do need well-typed containers, the idiomatic way to do this is using parametric types and parametric methods and so-called orthogonal design.

AFAICT a parametric declaration is the goods if, e.g. you want your method to specialise for arrays of Float32, or of Float64, or of Float16, or of Decimal, but e.g. not an array which mixes types. But also it should work on dense arrays, or view of dense arrays, or sparse arrays thereof, right? Then the correct type would look like one of these:

RealMatrixType = Union{AbstractMatrix{F},SparseMatrixCSC{F}} where {F<:Real}
RealVectorType = Union{AbstractVector{F},SparseVector{F}} where {F<:Real}

EDIT: Don’t bother with this kind of function declaration; the compiler will work it out so long as the arguments’ types are sensible. In fact, it’s counter productive, since you might be unnecessarily restrictive. The rule is: let the compiler work out the argument types in function definitions, but you should choose the types in variable definitions (i.e. when you are calling said functions)

Argument syntax is only OK

Keyword arguments exist but do not participate in method dispatch. Keyword arguments are second-class citizens and might make things slow or stupid if you need to specialise your code based on them. So… design your functions around that. This is usually OK with careful thought, but small lapses leads to many irritating helper functions to handle default arguments.

Containers of containers need careful thinking about parametric types

It’s hard to work out what went wrong when you hear Type definition: expected UnionAll, got TypeVar.

Traits and dispatch

Dispatching is not always obvious.

I always forget one keyword here is Value Types for these, which is what allows one to choose a method based on the value, rather than type of a thing. Their usage was not obvious (to me) from the manual but explained beautifully by Tim Holy. There are dangers.

Chris Rackaukas explain Holy Traits which are a julia idiomatic duck typing method, explained in greater depth by Mauro3 and Lyndon White.

🚧

Profiling

See Julia debugging, profiling and accelerating.

Installation

See installing Julia.

Workflow

See Julia IDEs and workflow.

API, FFIs

See APIs and IO.

Pretty printing, formatting

Strings, care and feeding

There are many formatting libraries, because everyone seems to dislike the built-in options, which have various limitations.

Many things seems to be based on Formatting.jl which apes python string interpolation and this is IMO a baseline good idea for formatting numbers. There is a friendly-ish fork, Format.jl which has a slightly different philsophy

This package offers Python-style general formatting and c-style numerical formatting (for speed).

Rich display

Julia has an elaborate display system for types, as illustrated by Simon Dernisch and Henry Shurkus.

tl;dr To display an object, invoke

display(instance)

Say you defined MyType. To ensure MyType displays sanely, define

show(io, instance::MyType)

e.g.

function show(io::IO, inst::MyType)
    print(io, "MyType(length=$(inst))")
end

Matti Pastell’s useful reproducible document rendering system Weave.jl supports magical table displaying for Latex/HTML. TexTables.jl is a specialised table renderer for scientific table output. More specialised yet, RegressionTables does statistical tables with baked-in analysis and a side-order of formatting. (Bonus: its launch announcement is a tour of the statistical ecosystem.)

Mustache.jl also has some traction, being a templating syntax used by some other things.

Latexify marks up certain Julia objects nicely for mathematical TeX display.

latexify("x/y") |> print
$\frac{x}{y}$

Debugging and coding

See Julia debugging, profiling and accelerating.

Iteration/Indexing

Julia arrays are 1-indexed, right? No. They have arbitrary indices. Some care is needed to support the (recently introduced) arbitrary indexing. So best practice is not to count from 1:length(A).

If you just need to visit each element, iterate using eachindex.

If you want to iterate by axes… instead of

function mycopy!(dest::AbstractVector, src::AbstractVector)
    length(dest) == length(src) || throw(
        DimensionMismatch("vectors must match"))
    # OK, now we’re safe to use @inbounds, right? (not anymore!)
    for i = 1:length(src)
        @inbounds dest[i] = src[i]
    end
    dest
end

you do

function mycopy!(dest::AbstractVector, src::AbstractVector)
    axes(dest) == axes(src) || throw(
        DimensionMismatch("vectors must match"))
    for i in LinearIndices(src)
        @inbounds dest[i] = src[i]
    end
    dest
end

Generic multidimensional algorithms are not necessarily intuitive but are possible. There are several methods.

Broadcasting is often simplest and fastest but requires some thought.

Dot-fusion/broadcasting

There are some couple issues here; dot syntax in julia provides a means to both compactly express vectorized operations in julia and also to do smart broadcasting and operation fusing (which compiles functions down to vectorised kernels collapsing intermediate operations.) The details are non-obvious, but you can reportedly often ignore them and just use the @. macro to magically invoke it. In most cases it works transparently on your own functions too.

Note the manual says:

In Julia, vectorized functions are not required for performance, and indeed it is often beneficial to write your own loops (see Performance Tips), but they can still be convenient.

This is directly contradicted shortly after by the same manual and also the ephemeral julia discussion on the net. As some recent packages note you probably do want to use this syntax for vector operations because it works on GPUs, whereas using basic iteration syntax might fail to parallelize to GPUs.

In practice some thought is required to get this working; One might need reshape to get the dimensions lined up, or more complicated indexing. Pankaj Mishra’s note includes an example of how you would do complicated indexing and mutation in parallel over sparse matrices using getindex and setindex!.

This is not a huge problem; vectorised syntax is not so opaque.

But what if I want to do custom broadcasting because, e.g. I can do some operation over a grid faster using an FFT, so the FFT must be taken when I broadcast? How do I define custom broadcasting for this kind of case for some type in julia?

I’ll try to work this out from the moving parts in the ApproxFun system. 🚧

Indexing systems

For some reason there seems to be shame attached to using macros for this, but there is a fairly simple and transparent macro system called [email protected]; See Mike Innes’ explanation.

If you want to avoid macros but you can’t work out how to get your broadcasts to line up, there is the CartesianIndices system, which is compact, general but not especially clear. Here is the “surprisingly simple” (sic) one-pole filter implementation

function expfiltdim(x, dim::Integer, α)
    s = similar(x)
    Rpre = CartesianIndices(size(x)[1:dim-1])
    Rpost = CartesianIndices(size(x)[dim+1:end])
    _expfilt!(s, x, α, Rpre, size(x, dim), Rpost)
end

function _expfilt!(s, x, α, Rpre, n, Rpost)
    for Ipost in Rpost
        # Initialize the first value along the filtered dimension
        for Ipre in Rpre
            s[Ipre, 1, Ipost] = x[Ipre, 1, Ipost]
        end
        # Handle all other entries
        for i = 2:n
            for Ipre in Rpre
                s[Ipre, i, Ipost] = α*x[Ipre, i, Ipost] +
                    (1-α)*s[Ipre, i-1, Ipost]
            end
        end
    end
    s
end

🚧

You can avoid any of these for a couple of common cases. EllipsisNotation provides compact intuitive syntactic sugar if you wish to get at the first or last axis.

A[..,1] = [2 1 4 5
           2 2 3 6]

A[..,2] = [3 2 6 5
          3 2 6 6]

A[:,:,1] == [2 1 4 5
             2 2 3 6] #true

For iterating over one index of an arbitrary array, mapslices is convenient, or for extracting a slice along an arbitrary index, selectdim.

There is also AxisAlgorithms, which does batch matrix multiplications and inversions for a couple of handy cases.

Arrays are column-major

The first index should change fastest. That is, walk over the last index in your outer loop, and apply operations to slices comprising the other indices.

Multidimensional arrays in Julia are stored in column-major order. This means that arrays are stacked one column at a time. This can be verified using the vec function or the syntax [:]copy_col_row is much faster than copy_row_col because it follows our rule of thumb that the first element to appear in a slice expression should be coupled with the inner-most loop. This is expected because copy_cols respects the column-based memory layout of the Matrix and fills it one column at a time. Additionally,… it follows our rule of thumb that the first element to appear in a slice expression should be coupled with the inner-most loop.

Einstein summation

Einstein summation is a handy way of thinking about lots of tensor operations which is not as esoteric as it sounds, inclduing, e.g. batch matrix multiply. TensorOperations.jl does some optimised einstein summations. Einsum.jl does reputedly more einstein summations but with possibly less optimistaion.

Structured matrices

Chris says

One of the best scientific computing packages is BandedMatrices.jl. It lets you directly define banded matrices and then overloads things like * and \ to use the right parts of BLAS. Compared to just using a sparse matrix (the standard MATLAB/Python way), this is SCREAMING FAST (the QR factorization difference is pretty big). Banded matrices show up everywhere in scientific computing (pretty much every PDE has a banded matrix operator). If it’s not a banded matrix, then you probably have a block-banded matrix, and thank god that (???) also wrote BlockBandedMatrices.jl which will utilize the same speed tricks but expanded to block banded matrices (so, discretizations of systems of PDEs).

Data loading/saving/exchange

See Julia debugging, profiling and accelerating.

Signal processing

See Julia debugging, profiling and accelerating.

Plotting

Due to the explosion of options, segmented off into a separate Julia plotting page.

Gradient wrangling

See julia_ml.

Approximating and interpolating

ApproxFun.jl does Chebychev and Fourier approximations of given functions This is not, at least primarily, a tool for data analysis, but for solving eigenfunction problems and such like using computational Hilbert space methods for functions which are otherwise difficult. In particular, we must be able to evaluate the target functions at arbitrary points to construct the interpolant, rather than at, say, provided sample points. Useful companion package FastTransforms converts between different basis function representaitons. Interpolations.jl does arbitrary order spline interpolations of mathematical functions, but also data. This enables some very clever tricks, e.g. approximate random sampling of tricky distributions.

Caching

See Julia debugging, profiling and accelerating.

Units

People sometimes mention that Julia can handle physical units in a syntactically pleasant fashion, but do not actually explain it. It’s not clear how, since all the documentation assumes I already know. From what I can tell, I first need to actually install Unitful, which includes some useful units, and in particular Quantity types.

Then I can use units in the following fashion:

$ using Unitful
$ const s = u"s"
$ const minute = u"minute"
$ 5.3s + 2.0minute
125.3 s

To know know how this works, and also how I can invent my own units, I read Erik Engheim or Lyndon White who explain it in depth.

See SampledSignals for a concrete example of how to do things with units, such as the following method definitions to handle time-to-index conversion.

toindex(buf::SampleBuf, t::Number) = t
toindex(buf::SampleBuf, t::Unitful.Time) = inframes(Int, t, samplerate(buf)) + 1
toindex(buf::SampleBuf, t::Quantity) = throw(Unitful.DimensionError(t, s))

Macros and reflection

Destructuring assignment

a.k.a. splatting. Not built in, but very fancy ones are provided by the macro Destructure.jl.

What is this module

Took me a while to work out that the current module is called @__MODULE__.

So if I want to look up a function by symbol, for example,

function makewindow(name::Symbol, args...; kwargs...)
    winfunc = getfield(@__MODULE__, name)
    makewindow(winfunc, args...; kwargs...)
end

UIs and servers

See Julia debugging, profiling and accelerating.