The Living Thing / Notebooks :

Julia

The hippest way to get your IEEE754 on

Julia: A JIT-compiled language that aims for high performance scientific computation.

It makes ambitious claims about being the fastest and best thing ever. The community process is problematic, (see also) however, and I prefer the proven method of using python and optimizing the performance sensitive code with one of the many tools to do that.

Contents

That said, the idea of a science-users-first JIT language is timely, and Julia is that. Python has clunky legacy issues in the numeric code and a patchy API. Matlab is expensive and nasty for non-numerics. Lua has some good science libraries and could likely have filled this niche but for AFAICT sociological reasons has not acquired the hipness or critical mass of Julia.

And there are some things specific to Julia which are serious selling points, aside from the language-feature one-upmanship.

The language

Intros

In order of increasing depth

  1. Julia by example is all you need to go, if you have other programming language experience.

  2. Bogumił Kamiński, The Julia Express

  3. Introducing Julia has the unfortunate leaden prose style of most wikibooks, but for sure will get you educated.

  4. Chris Rackauckas notes for UCI Data Science Initiative. Top quote:

    A Mental Model for Julia: Talking to a Scientist

    • When you’re talking, everything looks general. However, you really mean very specific details determined by context.
    • You can quickly dig deep into a subject, assuming many rules, theories, and terminology.
    • Nothing is hidden: if you ever want to hear about every little detail, you can ask.
    • They will get mad (and throw errors at you) if you begin to be loose with the specific details.

    See also his 7 Julia gotchas

  5. Official documentation is totally fine but arse-backwards, as official docs tend to be.

APIs

See the API list

Calling C from Julia

Sort of easy, but there is a tedious need to define the call signature at call time.

R

XRJulia:

This package provides an interface from R to Julia, based on the XR structure, as implemented in the XR package, in this repository.

Rjulia:

rjulia provides an interface between R and Julia. It allows a user to run a script in Julia from R, and maps objects between the two languages.

Python

PyCall.js invokes python.

Toolkits

Data loading/saving/exchange

The names are nearly all self explaining

Debugging and coding

Debugger, Gallium.jl:

This is the julia debugger. Please note that the 0.6 version of this package currently does not support breakpointing, C/C++ debugging or native code inspection. These features are being rebuilt, but were never particularly reliable in prior versions of this package and a cause of instability for the more mature features. In exchange, this package features a significantly more robust pure julia debug prompt, provided by ASTInterpreter2. Please file interpreter issues against that package.

Linter, Lint.jl (also has an atom linter plugin):

Statistics

DataFrames are provided by DataFrames.jl, and also DataTables.jl. The two are subtly incompatible in completely boring ways which you can hopefully ignore soon. For now, use IterableTables.jl to translate where needed. Blech. One can access dataframes (And DataTables and SQL databases and streaming data sources) using Query.jl. You can load a lot of the r standard datasets using [RDatasets](https://github.com/johnmyleswhite/RDatasets.jl)

using RDatasets
iris = dataset("datasets", "iris")
neuro = dataset("boot", "neuro")

Another notable product of JuliaStats organisation is Distributions.jl, a probability distribution toolkit providing densities, sampling etc.

Turing.jl does posterior inference.

Turing.jl is a Julia library for (universal) probabilistic programming. Current features include:

  • Universal probabilistic programming with an intuitive modelling interface
  • Hamiltonian Monte Carlo (HMC) sampling for differentiable posterior distributions
  • Particle MCMC sampling for complex posterior distributions involving discrete variables and stochastic control flows
  • Gibbs sampling that combines particle MCMC and HMC

Possibly it is a competitor of Klara.jl, the Juliastats MCMC.

The aspirational ggplot clone is gadfly.

Differentiating, optimisation

Laplacians.jl by Dan Spielman et al is an advanced matrix factorisation toolkit.

The deep learning toolkits are, for the moment, lacking features. Perhaps they’ll get there?

tensorflow.jl. (Surely one misses the benefit of Julia by using tensorflow, since there are two different array-processing infrastructures to pass between?)

Flux.jl sounds like a reimplementation of tensorflow-style differentiable programming inside Julia, which seems to me to be the way you’d actually do this right, given the end-to-end-optimised design philosophy of julia.

Flux is a library for machine learning. It comes “batteries-included” with many useful tools built in, but also lets you use the full power of the Julia language where you need it. The whole stack is implemented in clean Julia code (right down to the GPU kernels) and any part can be tweaked to your liking.

However, it’s missing many features of tensorflow.

The juliadiff project produces ForwardDiff.jl and ReverseDiff.jl which do what you would expect, namely autodiff.

Approximating

ApproxFun.jl does Chebychev and Fourier interpolations.

IDEs/workbooks

There is a reasonable IDE called juno, built on atom. There is jupyter integration through IJulia.

Both these have their own annoyances. e.g. Juno is single-window only so you can’t use multiple monitors, and thus you end up squinting at tiny windows of code hidden between all the outputs. Also, Atom’s panes just aren’t well-designed for this use-case so it’s full of a million tiny frictions.

If you install Juno as an app, but you also already use Jupyter, there is an additional annoyance because it hijacks your atom install in a confusing way and mangles your various package preferences. I recommend installing it from within atom via the uber-juno package.

Possibly you can bypass this using homebrew? I didn’t try. But maybe give this a burl:

brew cask install juno

IJulia also does its own overzealous installs per default, profligately installing another copy of jupyter, which you then have to update etc separately. Boring. You can bypass this by commanding it to use the perfectly good jupyter you already have installed:

ENV["JUPYTER"] = "/usr/local/bin/jupyter"
Pkg.add("IJulia")

Now IJulia appears as a normal kernel in your normal jupyter setup.

There is a package called Weave.jl which is inspired by R‘s knitr but compatible with jupyter, which could probably be used to fashion a working academic paper out of this.

Pkg.add("Weave")

UIs and servers

HttpServer does basic http protocol serving; this is made modular and composable by Mux.jl. Fancy caching and templating etc come from Genie.jl.

Escher.jl goes further, rendering HTML UI widgets etc.

Various other options are listed in aviks’ stackoverflow answer.

Gotchas

Implementing methods on custom types requires apparent injection into other namespaces

So you want to implement a standard interface on your type so you can, e.g. iterate over it.

julia> function Base.getindex(S::Squares, i::Int)
           1 <= i <= S.count || throw(BoundsError(S, i))
           return i*i
       end

julia> Squares(100)[23]
529

If you are lucky you might be able to inherit from AbstractArray:

julia> struct SquaresVector <: AbstractArray{Int, 1}
           count::Int
       end

julia> Base.size(S::SquaresVector) = (S.count,)

julia> Base.IndexStyle(::Type{<:SquaresVector}) = IndexLinear()

julia> Base.getindex(S::SquaresVector, i::Int) = i*i

The type system is reasonably logical, it’s just not obvious if you are used to classical OOP.

It’s unstable and hangs all the time

Yep.

Especially when it’s doing things that are supposed to be julia specialties, such as JIT-compiling dynamic inner functions. Use the recommended julia command:

killall -9 julia

FWIW this problem has mostly occurred for me using the JunoPro Intel MKL builds. Vanilla Juno is fine.

Workflow sucks

Yes.

You are using Julia because it is dynamic and because it is fast, but if you try to use code in a dynamic fashion, in the global namespace, it is no longer fast. Many complicated interactions ensue with the module system, and the recommended workarounds keep changing.

Julia>=0.6 workflow tips:

Put code under development in a temporary module. Create a file, say Tmp.jl, and include within it

module Tmp

<your definitions here>

end

Put your test code in another file. Create another file, say tst.jl, which begins with import Tmp and includes tests for the contents of Tmp. The value of using import versus using is that you can call reload("Tmp") instead of having to restart the REPL when your definitions change.

[…]Explore ideas at the julia command prompt.[…] Occasionally restart the REPL, issuing

reload("Tmp")
include("tst.jl")

That’s fine. What works more easily for me is running, in Juno console

@include("tmp.jl")

or in jupyter,

include("tmp.jl")

It allocates and copies arrays per default

Consider using views for slices, they say, which means not using slice notation but rather the view function, or this handy macro.

@views

Argument syntax is only OK

Keyword arguments exist but do not participate in method dispatch. Basically, keyword arguments are second-class citizens and might make things slow or stupid if you need to specialise your code.

AbstractFloat types poison the efficiency of Arrays (or other composite types)

AFAICT if you want fast numerics but you are not sure of your float precision, you should not use AbstractFloat for your argument type definitions (although it works fine for simple types). This is unfortunate, since there are good reasons to mix 32bit and 64 bit floats without casting - e.g. GPUs - and writing the same identical code twice seems awkward. Presumably one can work around this using parametric types and parametric methods and so-called orthogonal design, or typeof, without a combinatorial explosion of nearly identical functions, but I haven’t worked it out yet. Regardless, an unnecessary difficulty in a common use-case.