A C++/Python neural network toolkit by Google. I am using it for solving general machinelearning problems, and frequently enough that I have notes.
Abstractions

Keras supports tensorflow and Theano as a backend, for comfort and convenience. See below for some notes.

tensorflowslim eases some boring bits.

sonnet is Deepmind’s tensorflow library and shares with keras layerlike abstractions and some helpers to make recurrent neural nets bearable.
There are some other frontends, which seem a bit less useful to my mind:

tflearn wraps the tensorflow machine in scikitlearn (Although the implementation is not very enlightening, nor the syntax especially clear.)

estimator is a tensorflow generic estimator class. Relationship to other wrappers is not clear to me, but finding out would be tedious, so I will never know.
I’m not convinced these latter options actually solve any problems. They seem to make the easy bits not easier but different, and the hard bits no easier.
Tutorials
See also keras
tutorials below.
Debugging
Google’s own Tensorflow without a phd.
Basic ways:
Not in mid training? Explicitly fetch, and print (or do whatever you want) using
Session.run()
Tensorboard Histogram and Image Summary (see next section)
tf.Print(input, data, message=None, first_n=None, summarize=None, name=None)
(link)
tf.Assert(condition, data, summarize=None, name=None)
(link)Advanced ways:
Interpose any python codelet in the computation graph
A stepbystep debugger
tfdbg_
: The TensorFlow debugger
Tensorboard
Tensorboard is a de facto debugging tool standard. It’s not immediately intuitive; I recommend reading Li Yin’s explanation.
Minimally,
tensorboard logdir=path/to/logdirectory
or, more usually,
tensorboard logdir=name1:/path/to/logs/1,name2:/path/to/logs/2 host=localhost
or, lazily, (bash)
tensorboard logdir=$(ls dm *.logs tr d ' \n\r') host=localhost
(fish)
tensorboard logdir=(string join , (for f in *.logs; echo (basename $f .logs):$f; end)) host=localhost
In fact, that sometimes works not so well for me. Tensorboard reeeeally wants you to explicitly specify your folder names.
#!/bin/env python3
from pathlib import Path
from subprocess import run
import sys
p = Path('./')
logdirstring = 'logdir=' + ','.join([
str(d)[:5] + ":" + str(d)
for d in p.glob('*.logs')
])
proc = run(
[
'tensorboard',
logdirstring,
'host=localhost'
]
)
 Projector visualises embeddings:
TensorBoard has a builtin visualizer, called the Embedding Projector, for interactive visualization and analysis of highdimensional data like embeddings. It is meant to be useful for developers and researchers alike. It reads from the checkpoint files where you save your tensorflow variables. Although it’s most useful for embeddings, it will load any 2D tensor, potentially including your training weights.
Getting data in
This is a depressingly complex topic; Likely it’s more lines of code than building your actual learning algorithm.
For example, things break differently if

you are inputting data of variable dimensions via python which requires a “feed”, which requires keeping references to a placeholder
Op
around, and ALWAYS resubmitting the data every time you run an op, even if the data is not required for the current Op, or 
Or inputting a
Variable
(which may also be feeds, just to mess with you, and claim to also be variable dimensions but that never works for me) via C++.
These interact in various different ways that seem irritating, but are probably to do with enabling very large scale data reading workflows, so that you might accidentally solve a problem for Google and they can get your solution for cheap.
Here’s a walk through of some of the details. And here are the manual pages for feeding and queueing
My experience that that stuff is so horribly messy that you should just build different graphs for the estimation and deployment phases of your mode and implement them each according to convenience. This of course is asking for trouble with errors
I’m not yet sure how to easily transmit the estimated parameters between graphs in these two separate phases… I’ll make notes about THAT when i come to it.
(Nonrecurrent) convolutional networks

The Theano guide to convolutions is superior if you want to work out the actual dimensions your tensors should have. It also gives an intelligible account of how you invert convolutions for decoding.

The Tensorflow convolution guide is more lackadaisical, but it does get us there:
For the
SAME
padding, the output height and width are computed as:
python out_height = ceil(float(in_height) / float(strides[1])) out_width = ceil(float(in_width) / float(strides[2]))
For the
VALID
padding, the output height and width are computed as:“`python out_height = ceil(float(in_height  filter_height + 1) / float(strides[1])) out_width = ceil(float(in_width  filter_width + 1) / float(strides[2]))
[Tensorflow's 4d tensor packing for images](https://www.tensorflow.org/performance/performance_guide#use_nchw_image_data_format)?
> TensorFlow supports `NHWC` (default) and `NCHW` (cuDNN default).
> The best practice is to build models
> that work with both `NCHW` and `NHWC`
> as it is common to train using NCHW on GPU, and then do inference with NHWC on CPU.
`NCHW` is, to be clear, `(batch, channels, height, width)`.
Theano by contrast, is AFAICT always `NCHW`.
## Recurrent/fancy networks
The documentation for these is abysmal.
To write: How to create standard
[linear filters]({filename}signal_processing.md) in Tensorflow.
For now, my recommendation is to simply use [keras](http://keras.io/), which makes this easier
inside tensorflow, or [pytorch]({filename}pytorch.md), which makes it easier overall.
[tensorflow fold](https://github.com/tensorflow/fold) is a library which ingests structured data and simulates
[pytorch]({filename}pytorch.md)style dynamic graphs dependent upon its structure.
### Official documentation
The Tensorflow RNN documentation, as bad as it is, is not even easy to find,
being scattered across several nonobvious locations without consistent
crosslinks.
* [Overview docs](https://www.tensorflow.org/api_docs/python/rnn_cell/rnn_cells_for_use_with_tensorflow_s_core_rnn_methods)
* [Other docs of confusing relation to the prior docs](https://www.tensorflow.org/api_docs/python/rnn_cell/)
* [tutorial docs](https://www.tensorflow.org/tutorials/recurrent/).
* stateful minibatch training requires the catchy [SequenceQueueingStateSaver](https://www.tensorflow.org/api_docs/python/tf/contrib/training/SequenceQueueingStateSaver)
To make it actually make sense without unwarranted time wasting and guessing,
you will then need to read other stuff.
### Community guides
* `seq2seq` models with GRUs : [Fun with Recurrent Neural Nets](https://esciencegroup.com/2016/03/04/funwithrecurrentneuralnetsonemorediveintocntkandtensorflow/).
* [Variable sequence length HOWTO](https://gist.github.com/evanthebouncy/8e16148687e807a46e3f).
* [Where do the RNN weights come from](http://stackoverflow.com/a/40850664)?
[Magic](https://stackoverflow.com/questions/38692531/explanationofgrucellintensorflow/38694775#38694775).
* Denny Britz's blog posts
* [RNNs in Tensorflow, a practical guide and undocumented features](http://www.wildml.com/2016/08/rnnsintensorflowapracticalguideandundocumentedfeatures/).
* He also gives a good explanation of [vanishing gradients](http://www.wildml.com/2015/10/recurrentneuralnetworkstutorialpart3backpropagationthroughtimeandvanishinggradients/).
* Danijar Hafner
* [Introduction to Recurrent Networks in TensorFlow](https://danijar.com/introductiontorecurrentnetworksintensorflow/)
* [Variable sequence lengths HOWTO](https://danijar.com/variablesequencelengthsintensorflow/)
* Philippe Remy, [Stateful LSTM in Keras](https://philipperemy.github.io/kerasstatefullstm/)
* Ben Bolte, [Deep Language Modeling for Question Answering using Keras](http://ben.bolte.cc/blog/2016/keraslanguagemodeling.html)
* pro tip: [SequenceQueueingStateSaver](https://www.tensorflow.org/api_docs/python/tf/contrib/training/SequenceQueueingStateSaver) makes things easy.
## Keras: The recommended way of using tensorflow
You probably want to start using a higher level `keras`
unless your needs are extraordinarily esoteric or
you like reinventing wheels.
Keras is a good choice, since it removes a lot of boilerplate,
and makes even writing new boilerplate easier.
It adds only a few minor restrictions to your abilities,
but by creating a consistent API, has become something of a standard for early
access to complex new algorithms you would never have time to reimplement
yourself.
I would use it if I were you for anything involving standard neural networks,
especially any kind of recurrent network.
If you want to optimise a *generic*, nondeep neural model,
you might find the naked tensorflow API has less friction.
* [Easing pain via Keras](https://blog.keras.io/kerasasasimplifiedinterfacetotensorflowtutorial.html).
* Jason Brownlee's [HOWTO guide](http://machinelearningmastery.com/timeseriespredictionlstmrecurrentneuralnetworkspythonkeras/).
* [Recurrent neural networks' gradients are truncated to the sequence length](https://github.com/fchollet/keras/issues/3669),
which might not be
obvious. But this is the TBPTT parameter.
* [recurrentshop](https://github.com/datalogai/recurrentshop) makes it easier to manage recurrent topologies using [keras](http://keras.io/).
## Getting models out
* For a local app:
Hamed MP, [Exporting trained TensorFlow models to C++ the RIGHT way!](https://medium.com/@hamedmp/exportingtrainedtensorflowmodelstoctherightwaycf24b609d183#.2085hqqg6)
* For serving it online, Tensorflow `serving` is the preferred method.
See [the Serving documentation](https://tensorflow.github.io/serving/architecture_overview).
* for mobile app the [HBO joke hotdog app HOWTO](https://hackernoon.com/howhbossiliconvalleybuiltnothotdogwithmobiletensorflowkerasreactnativeef03260747f3) gives a wonderful explanation.
## Training in the cloud because you don't have NVIDIA sponsorship
See [practical cloud computing]({filename}practical_cloud_computing.md),
which has a couple of sections on that.
## Extending
Tensorflow [allows binary extensions](https://www.tensorflow.org/extend/adding_an_op#compile_the_op_using_your_system_compiler_tensorflow_binary_installation)
but don't really explain how it integrates with normal python builds.
Here is [an example from Uber](https://github.com/uber/horovod/blob/master/setup.py).
## Misc HOWTOs
### Nightly builds
[http://ci.tensorflow.org/view/Nightly/](http://ci.tensorflow.org/view/Nightly/)
(or build your own)
### Dynamic graphs
[Pytorch]({filename}pytorch.md) has JIT graphs and they are super hip, so now tensorflow has a
[dynamic graph mode](https://medium.com/@yaroslavvb/tensorflowmeetspytorchwitheagermode714cce161e6c),
called `Eager`.
### GPU selection
[setGPU](https://github.com/bamos/setGPU)
sets `NVIDIA_VISIBLE_GPU` to the least loaded GPU.
### Silencing tensorflow
```python
TF_CPP_MIN_LOG_LEVEL=1 primusrun python run_job.py biquad_fast
Hessians and higher order optimisation
Basic Newton method optimisation example. Very basic example that also shows how to create a diagonal hessian.
Slightly outdated, Hessian matrix. There is a discussion on Jacobians in TF, including, e.g. some fancy examples by jjough:
here’s mine — works for highdimensional Jacobians (numerator and denominator have >1 dimension), undefined batch sizes, and tensors that are not statically known.
Remember to use an interactive session, otherwise tf.get_default_session() will not be able to find the session.
python def tf_jacobian(tensor2, tensor1, feed_dict, sess = tf.get_default_session()): """ Computes the tensor d(tensor2)/d(tensor1) recursively. :param tensor2: numerator of Jacobian :param tensor1: denominator of Jacobian :param feed_dict: input data (need this if tensors are not statically known) :return: a tensor of dimension (dim_tensor2 x dim_tensor1) """ # can't do tensor.get_shape() because it doesn't work for undefined batch size shape = list(sess.run(tf.shape(tensor2), feed_dict)) if shape: # split tensor2 along first dimension and recur # int trick from https://github.com/tensorflow/tensorflow/issues/7754 tensor2_split = tf.split(axis = 0, num_or_size_splits = int(shape[0]), value = tensor2) grad_split = [tf_jacobian(tf.squeeze(M, squeeze_dims = 0), tensor1, feed_dict) for M in tensor2_split] return tf.stack(grad_split) else: # calculate gradient of scalar grad = tf.gradients(tensor2, tensor1) if grad[0] != None: return tf.squeeze(grad, squeeze_dims = [0]) else: # replace any undefined gradients with zeros return tf.zeros_like(tensor1)
And here’s one for batched tensors:
“`python def batch_tf_jacobian(tensor2, tensor1, feed_dict, sess = tf.get_default_session()): “”” Computes the matrix d(tensor2)/d(tensor1) recursively. Tensorflow doesn’t really have its own Jacobian operator (tf.gradients sums over all dims of tensor2).
:param tensor2: numerator of Jacobian, first dimension is batch :param tensor1: denominator of Jacobian, first dimension is batch :param feed_dict: input data (need this if tensors are not statically known) :return: batch Jacobian tensor """ shape2 = list(sess.run(tf.shape(tensor2), feed_dict)) shape1 = list(sess.run(tf.shape(tensor1), feed_dict)) jacobian = tf_jacobian(tensor2, tensor1, feed_dict) batch_size = shape2[0] batch_jacobian = [tf.slice(jacobian, [i] + [0]*(len(shape2)1) + [i] + [0]*(len(shape1)1), [1] + [1]*(len(shape2)1) + [1] + [1]*(len(shape1)1)) for i in range(batch_size)] batch_jacobian = [tf.squeeze(tensor, squeeze_dims = (0, len(shape2))) for tensor in batch_jacobian] batch_jacobian = tf.stack(batch_jacobian) return batch_jacobian
“`
Manage tensorflow environments
Optimisation tricks
Using traditional/experimental optimisers rather than SGDtype ones.
Simplify distributed training using Horovod.