The Living Thing / Notebooks :

Tensorflow

the framework to use for deep learning if you groupthink like Google

A C++/Python neural network toolkit by Google. I am using it for solving general machine-learning problems, and frequently enough that I need notes.

The construction of graphs is more explicit than in Theano, so I find it easier to understand, although this means that you lose the near-python syntax of Theano.

Tensorflow also claims to compile to smartphones etc, although that looks buggy ATM.

Tutorials

See also keras tutorials below.

Debugging

Joonwook Choi recommends:

Basic ways:

  • Not in mid training? Explicitly fetch, and print (or do whatever you want) using Session.run()
  • Tensorboard: Histogram and Image Summary
  • tf.Print(input, data, message=None, first_n=None, summarize=None, name=None) (link)
  • tf.Assert(condition, data, summarize=None, name=None) (link)

Advanced ways:

  • Interpose any python codelet in the computation graph
  • A step-by-step debugger
  • tfdbg_: The TensorFlow debugger

Tensorboard

Tensorboard is a de facto debugging tool standard. It’s not immediately intuitive; I recommend reading Li Yin’s explanation.

Minimally,

tensorboard --logdir=path/to/log-directory

or, more usually,

tensorboard --logdir=name1:/path/to/logs/1,name2:/path/to/logs/2 --host=localhost

or, lazily,

tensorboard --logdir=$(ls -dm *.logs |tr -d ' \n\r') --host=localhost

In fact, that sometimes works not so well for me. Tensorboard reeeeally wants you to explicitly specify your folder names.

#!/bin/env python3
from pathlib import Path
from subprocess import run
import sys

p = Path('./')

logdirstring = '--logdir=' + ','.join([
    str(d)[:-5] + ":" + str(d)
    for d in p.glob('*.logs')
])

proc = run(
    [
        'tensorboard',
        logdirstring,
        '--host=localhost'
    ]
)
  • Projector visualises embeddings:

    TensorBoard has a built-in visualizer, called the Embedding Projector, for interactive visualization and analysis of high-dimensional data like embeddings. It is meant to be useful for developers and researchers alike. It reads from the checkpoint files where you save your tensorflow variables. Although it’s most useful for embeddings, it will load any 2D tensor, potentially including your training weights.

Getting data in

This is a depressingly complex topic; Likely it’s more lines of code than building your actual learning algorithm.

For example, things break differently if

These interact in various different ways that seem irritating, but are probably to do with enabling very large scale data reading workflows, so that you might accidentally solve a problem for Google and they can get your solution for cheap.

Here’s a walk through of some of the details. And here are the manual pages for feeding and queueing

My experience that that stuff is so horribly messy that you should just build different graphs for the estimation and deployment phases of your mode and implement them each according to convenience.

I’m not yet sure how to easily transmit the estimated parameters between graphs in these two separate phases… I’ll make notes about THAT when i come to it.

(Non-recurrent) convolutions

For the SAME padding, the output height and width are computed as:

out_height = ceil(float(in_height) / float(strides[1]))
out_width  = ceil(float(in_width) / float(strides[2]))

For the VALID padding, the output height and width are computed as:

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
out_width  = ceil(float(in_width - filter_width + 1) / float(strides[2]))

Tensorflow’s 4d tensor packing for images?

TensorFlow supports NHWC (default) and NCHW (cuDNN default). The best practice is to build models that work with both NCHW and NHWC as it is common to train using NCHW on GPU, and then do inference with NHWC on CPU.

NCHW is, to be clear, (batch, channels, height, width).

Theano by contrast, is AFAICT always NCHW.

Recurrent networks

The documentation for these is abysmal.

To write: How to create standard linear filters in Tensorflow.

Official documentation

The Tensorflow RNN documentation, as bad as it is, is not even easy to find, being scattered across several non-obvious locations without consistent crosslinks.

To make it actually make sense without unwarranted time wasting and guessing, you will then need to read other stuff:

Keras

You probably want to start using a higher level keras unless your needs are extraordinarily esoteric or you like reinventing wheels. Keras is a good choice, since it removes a lot of boilerplate, and makes even writing new boilerplate easier.

Useful libraries

recurrentshop makes it easier to manage recurrent topologies using keras.

Go faster for free

stackoverflow recommends a tweak on the classic source install:

.. code:: shell

./configure bazel build —config=opt //tensorflow/tools/pip_package:build_pip_package bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg pip install /tmp/tensorflow_pkg/tensorflow-1.0.0-py2-none-any.whl

bazel build -c opt —copt=-mavx —copt=-mavx2 —copt=-mfma —copt=-mfpmath=both —copt=-msse4.2 —config=cuda -k //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg pip3 install /tmp/tensorflow_pkg/tensorflow-1.0.0-cp35-cp35m-macosx_10_6_intel.whl

Getting models out

Doing it in the cloud because you don’t have NVIDIA sponsorship

See practical cloud computing, which has a couple of sections on that.

Misc HOWTOs

Gradient tricks

Hessian matrix. There is a discussion on Jacobians in TF, including some fancy examples by jjough:

here’s mine — works for high-dimensional Jacobians (numerator and denominator have >1 dimension), undefined batch sizes, and tensors that are not statically known.

Remember to use an interactive session, otherwise tf.get_default_session() will not be able to find the session.

def tf_jacobian(tensor2, tensor1, feed_dict, sess = tf.get_default_session()):
    """
    Computes the tensor d(tensor2)/d(tensor1) recursively.
    :param tensor2: numerator of Jacobian
    :param tensor1: denominator of Jacobian
    :param feed_dict: input data (need this if tensors are not statically known)
    :return: a tensor of dimension (dim_tensor2 x dim_tensor1)
    """
    # can't do tensor.get_shape() because it doesn't work for undefined batch size
    shape = list(sess.run(tf.shape(tensor2), feed_dict))
    if shape:
        # split tensor2 along first dimension and recur
        # int trick from https://github.com/tensorflow/tensorflow/issues/7754
        tensor2_split = tf.split(axis = 0, num_or_size_splits = int(shape[0]), value = tensor2)
        grad_split = [tf_jacobian(tf.squeeze(M, squeeze_dims = 0), tensor1, feed_dict) for M in tensor2_split]
        return tf.stack(grad_split)
    else:
        # calculate gradient of scalar
        grad = tf.gradients(tensor2, tensor1)
        if grad[0] != None:
            return tf.squeeze(grad, squeeze_dims = [0])
        else:
            # replace any undefined gradients with zeros
            return tf.zeros_like(tensor1)

And here’s one for batched tensors:

def batch_tf_jacobian(tensor2, tensor1, feed_dict, sess = tf.get_default_session()):
    """
    Computes the matrix d(tensor2)/d(tensor1) recursively.
    Tensorflow doesn't really have its own Jacobian operator
    (tf.gradients sums over all dims of tensor2).

    :param tensor2: numerator of Jacobian, first dimension is batch
    :param tensor1: denominator of Jacobian, first dimension is batch
    :param feed_dict: input data (need this if tensors are not statically known)
    :return: batch Jacobian tensor
    """
    shape2 = list(sess.run(tf.shape(tensor2), feed_dict))
    shape1 = list(sess.run(tf.shape(tensor1), feed_dict))

    jacobian = tf_jacobian(tensor2, tensor1, feed_dict)
    batch_size = shape2[0]

    batch_jacobian = [tf.slice(jacobian, [i] + [0]*(len(shape2)-1) + [i] + [0]*(len(shape1)-1),  [1] + [-1]*(len(shape2)-1) + [1] + [-1]*(len(shape1)-1)) for i in range(batch_size)]
    batch_jacobian = [tf.squeeze(tensor, squeeze_dims = (0, len(shape2))) for tensor in batch_jacobian]
    batch_jacobian = tf.stack(batch_jacobian)
    return batch_jacobian