The Living Thing / Notebooks :

Here’s how I would do art with machine learning if I had to

I’ve a weakness for ideas that give me plausible deniability for making generative art while doing my maths homework.

So do you

This page is more chaotic than the already-chaotic median, sorry. Good luck making sense of it.

See also analysis/resynthesis.


Machine learning generally

See gesture recognition. Oh and also google’s AMI channel, and ml4artists, which has some sweet machine learning for artists topic guides.

Neural networks in particular

Many neural networks, are generative in the sense that even if you train ‘em to classify things, they can also predict new members of the class. e.g. run the model forwards, it recognizes melodies; run it “backwards”, it composes melodies. Or rather, you maybe trained them to generate examples in the course of training them to detect examples.

There are many definitional and practical wrinkles here, and this quality is not unique to artificial neural networks, but it is a great convenience, and the gods of machine learning have blessed us with much infrastructure to exploit this feature, because it is very close to actual profitable algorithms. Upshot: There is now a lot of computation and grad student labour directed at producing neural networks which as a byproduct can produce faces, chairs, film dialogue, symphonies and so on.


Some as-yet-unfiled neural-artwork links I should think about.

  • So simple it’s cute, CPPNs are probably what Jonathan McCabe has been producing for years.
  • IGAN, iGAN: Interactive Image Generation via Generative Adversarial Networks
  • interpolating style transfer.
  • neurogram is a compact semi—untrained neural network image synthesis-in-the-browser project
  • Adversarial generation is a cool hack if you hate boring stuff like labelling data sets e.g. chair generation
  • Autoencoding beyond pixels using a learned similarity metric (LSLW15) code The clever hack here is the “generative adversarial networks”

Variational inference (Hint07, WiBi05, Giro01, MnGr14) looks exciting here, particularly in an autoencoder setting. (KiWe13)

Text synthesis

Visual synthesis

@bhautikj style transfer “Drumpf” @bhautikj style transfer “Drumpf”

See those classic images from google’s tripped-out image recognition systems) or Gatys, Ecker and Bethge’s deep art Neural networks do a passable undergraduate Monet.

Here’s Frank Liu’s implementation of style transfer in pycaffe.

Alex Graves, Generating Sequences With Recurrent Neural Networks, generates handwriting. Relatedly, sketch-rnn is reaaaally cute

Deep dreaming approaches are entertaining (NSFW). Here’s a more pedestrian and slightly more informative version of that. has some lovely visual explanations of visual and other neural networks:

  • Experiments in Handwriting with a Neural Network
  • Deconvolution and Checkerboard Artifacts
  • How to Use t-SNE Effectively
  • Attention and Augmented Recurrent Neural Networks
  • hardmaru presents an amazing introduction to running sophisticated neural networks in the browser, targeted at artists, which goes over the handwriting post in a non-technical way.

Composing music

Seems like it should be easy, until you think about it.

Related: Arpeggiate by numbers.

Pixelrnn turns out to be good at music Dadabots have successfully weaponised samplernn and it’s cute.

Google has weighed in like a gorilla on the metallophone to do midi composition with Tensorflow as part of their Magenta project. Their NIPS 2016 demo won the best demo prize.

Daniel Johnson has a convolutional and recurrent architecture for taking into account multiple types of dependency in music, which he calls biaxial neural network Zhe LI, Composing Music With Recurrent Neural Networks.

Ji-Sung Kim’s deepjazz project is minimal, but does interesting jazz improvisations. Part of the genius here is choosing totally chaotic music to try to ape, so you can ape it chaotically. (Code)

Boulanger-Lewandowski, (code and data) for BoBV12’s recurrent neural network composition using python/Theano. Christian Walder leads a project which shares some roots with that. (Wald16a, Wald16b) Bob Sturm’s FolkRNN does a related thing, but ingeniously redefines the problem by focussing on folk tune notation.

A tutorial on generating music using Restricted Boltzmann Machines for the conditional random field density, and an RNN for the time dependence after BoBV12.

Bob Sturm did a nice one

TBD: google’s latest demo in this area was popular.

Audio synthesis

See also analysis/resynthesis.

Matt Vitelli on music generation from MP3s (source)

Soundtracking audio from video.

Alex Graves on RNN predictive synthesis.

Andy Sarrof, Musical Audio Synthesis Using Autoencoding Neural Nets. (code)

Style transfer for audio is crying out to be done, but I’ve only seen more traditional techniques. (UPDATE: It’s happening these days, do some googling)

@bhautikj style transfer experiment “Drumpf”

Style transfer will be familiar to anyone who has ever taken hallucinogens or watched movies made by those who have, but you can’t usually put hallucinogens or film nights on the departmental budget so we have to make do with gigantic computing clusters.


Boulanger-Lewandowski, N., Bengio, Y., & Vincent, P. (2012) Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription. In 29th International Conference on Machine Learning.
Bown, O., & Lexer, S. (2006) Continuous-Time Recurrent Neural Networks for Generative and Interactive Musical Performance. In F. Rothlauf, J. Branke, S. Cagnoni, E. Costa, C. Cotta, R. Drechsler, … H. Takagi (Eds.), Applications of Evolutionary Computing (pp. 652–663). Springer Berlin Heidelberg
Champandard, A. J.(2016) Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks. ArXiv:1603.01768 [Cs].
Denton, E., Chintala, S., Szlam, A., & Fergus, R. (2015) Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. ArXiv:1506.05751 [Cs].
Dieleman, S., & Schrauwen, B. (2014) End-to-end learning for music audio. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6964–6968). IEEE DOI.
Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016) Density estimation using Real NVP. In arXiv:1605.08803 [cs, stat].
Dosovitskiy, A., Springenberg, J. T., Tatarchenko, M., & Brox, T. (2014) Learning to Generate Chairs, Tables and Cars with Convolutional Networks. ArXiv:1411.5928 [Cs].
Dumoulin, V., Shlens, J., & Kudlur, M. (2016) A Learned Representation For Artistic Style. ArXiv:1610.07629 [Cs].
Gatys, L. A., Ecker, A. S., & Bethge, M. (2015) A Neural Algorithm of Artistic Style. ArXiv:1508.06576 [Cs, q-Bio].
Girolami, M. (2001) A Variational Method for Learning Sparse and Overcomplete Representations. Neural Computation, 13(11), 2517–2532. DOI.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014) Generative Adversarial Networks. ArXiv:1406.2661 [Cs, Stat].
Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014) Explaining and Harnessing Adversarial Examples. ArXiv:1412.6572 [Cs, Stat].
Gregor, K., Danihelka, I., Graves, A., Rezende, D. J., & Wierstra, D. (2015) DRAW: A Recurrent Neural Network For Image Generation. ArXiv:1502.04623 [Cs].
Gregor, K., & LeCun, Y. (2010) Learning fast approximations of sparse coding. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 399–406).
Gregor, K., & LeCun, Y. (2011) Efficient Learning of Sparse Invariant Representations. ArXiv:1105.5307 [Cs].
Grosse, R., Salakhutdinov, R. R., Freeman, W. T., & Tenenbaum, J. B.(2012) Exploiting compositionality to explore a large space of model structures. In Proceedings of the Conference on Uncertainty in Artificial Intelligence.
Hadsell, R., Chopra, S., & LeCun, Y. (2006) Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Vol. 2, pp. 1735–1742). DOI.
He, K., Wang, Y., & Hopcroft, J. (2016) A Powerful Generative Model Using Random Weights for the Deep Image Representation. In Advances in Neural Information Processing Systems.
Hinton, G. E.(2007) Learning multiple layers of representation. Trends in Cognitive Sciences, 11(10), 428–434. DOI.
Hinton, G. E., & Salakhutdinov, R. R.(2006) Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. DOI.
Jetchev, N., Bergmann, U., & Vollgraf, R. (2016) Texture Synthesis with Spatial Generative Adversarial Networks. In Advances in Neural Information Processing Systems 29.
Jing, Y., Yang, Y., Feng, Z., Ye, J., & Song, M. (2017) Neural Style Transfer: A Review. ArXiv:1705.04058 [Cs].
Johnson, J., Alahi, A., & Fei-Fei, L. (2016) Perceptual Losses for Real-Time Style Transfer and Super-Resolution. ArXiv:1603.08155 [Cs].
Kalchbrenner, N., Danihelka, I., & Graves, A. (2015) Grid Long Short-Term Memory. ArXiv:1507.01526 [Cs].
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017) Progressive Growing of GANs for Improved Quality, Stability, and Variation. ArXiv:1710.10196 [Cs, Stat].
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., & Welling, M. (2016) Improving Variational Inference with Inverse Autoregressive Flow. ArXiv:1606.04934 [Cs, Stat].
Kingma, D. P., & Welling, M. (2013) Auto-Encoding Variational Bayes. ArXiv:1312.6114 [Cs, Stat].
Larsen, A. B. L., Sønderby, S. K., Larochelle, H., & Winther, O. (2015) Autoencoding beyond pixels using a learned similarity metric. ArXiv:1512.09300 [Cs, Stat].
Lazaridou, A., Nguyen, D. T., Bernardi, R., & Baroni, M. (2015) Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation. ArXiv:1506.03500 [Cs].
Li, Y., Wang, N., Liu, J., & Hou, X. (2017) Demystifying Neural Style Transfer. ArXiv:1701.01036 [Cs].
Luo, Y., Chen, Z., Hershey, J. R., Roux, J. L., & Mesgarani, N. (2016) Deep Clustering and Conventional Networks for Music Separation: Stronger Together. ArXiv:1611.06265 [Cs, Stat].
Malmi, E., Takala, P., Toivonen, H., Raiko, T., & Gionis, A. (2016) DopeLearning: A Computational Approach to Rap Lyrics Generation. ArXiv:1505.04771 [Cs], 195–204. DOI.
Mnih, A., & Gregor, K. (2014) Neural Variational Inference and Learning in Belief Networks. In Proceedings of The 31st International Conference on Machine Learning.
Neil, D., Pfeiffer, M., & Liu, S.-C. (2016) Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3882–3890). Curran Associates, Inc.
Owens, A., Isola, P., McDermott, J., Torralba, A., Adelson, E. H., & Freeman, W. T.(2015) Visually Indicated Sounds. ArXiv:1512.08512 [Cs].
Sarroff, A. M., & Casey, M. (2014) Musical audio synthesis using autoencoding neural nets. . Ann Arbor, MI: Michigan Publishing, University of Michigan Library
Sigtia, S., Benetos, E., Boulanger-Lewandowski, N., Weyde, T., Garcez, A. S. d’Avila, & Dixon, S. (2015) A hybrid recurrent neural network for music transcription. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2061–2065). IEEE
Smith, E. C., & Lewicki, M. S.(2006) Efficient auditory coding. Nature, 439(7079), 978–982. DOI.
Sun, Z., Liu, J., Zhang, Z., Chen, J., Huo, Z., Lee, C. H., & Zhang, X. (2016) Composing Music with Grammar Argumented Neural Networks and Note-Level Encoding. ArXiv:1611.05416 [Cs].
Theis, L., & Bethge, M. (2015) Generative Image Modeling Using Spatial LSTMs. ArXiv:1506.03478 [Cs, Stat].
Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2016) Instance Normalization: The Missing Ingredient for Fast Stylization. ArXiv:1607.08022 [Cs].
Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2017) Improved Texture Networks: Maximizing Quality and Diversity in Feed-forward Stylization and Texture Synthesis. ArXiv:1701.02096 [Cs].
van den Oord, A. (2016) Wavenet: A Generative Model for Raw Audio.
van den Oord, A., Kalchbrenner, N., & Kavukcuoglu, K. (2016) Pixel Recurrent Neural Networks. ArXiv:1601.06759 [Cs].
van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., & Kavukcuoglu, K. (2016) Conditional Image Generation with PixelCNN Decoders. ArXiv:1606.05328 [Cs].
Walder, C. (2016a) Modelling Symbolic Music: Beyond the Piano Roll. ArXiv:1606.01368 [Cs].
Walder, C. (2016b) Symbolic Music Data Version 10. ArXiv:1606.02542 [Cs].
Winn, J. M., & Bishop, C. M.(2005) Variational message passing. In Journal of Machine Learning Research (pp. 661–694).
Wu, Q., Shen, C., Hengel, A. van den, Liu, L., & Dick, A. (2015) What value high level concepts in vision to language problems?. ArXiv:1506.01144 [Cs].
Wyse, L. (2017) Audio Spectrogram Representations for Processing with Convolutional Neural Networks. In Proceedings of the First International Conference on Deep Learning and Music, Anchorage, US, May, 2017 (arXiv:1706.08675v1 [cs.NE]).
Yu, D., & Deng, L. (2011) Deep Learning and Its Applications to Signal and Information Processing [Exploratory DSP]. IEEE Signal Processing Magazine, 28(1), 145–154. DOI.
Yu, H., & Varshney, L. R.(2017) Towards deep interpretability (MUS-ROVER II): learning hierarchical representations of tonal music. In Proceedings of International Conference on Learning Representations (ICLR) 2017.
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., & Efros, A. A.(2016) Generative Visual Manipulation on the Natural Image Manifold. ArXiv:1609.03552 [Cs].