The Living Thing / Notebooks : Garbled highlights from NIPS 2016

Full paper listing.

Favoured Demos

Memo Akten, Tom White, Google Magenta team.

Variational Inference: foundations and Modern Methods

David Blei, Shakir Mohamed, Rajesh Ranganath did a really good tutorial on variational inference; my favourite, and a model of clarity.

Keywords of interest:

Theory and Algorithms for forecasting Non-stationary Time series

Vitaly Kuznetsov, Mehryar Mohri

At last! Learning theory for time series!

Generative Adversarial models

Ian Goodfellow

Isola et al 2016 Image->Image translation looks good for new-wave super-resolution. But how would you find the estimation error of such a method for a given statistic?

MetaGrad: Multiple Learning rates in Online Learning

Tim van Erven, Wouter M Koolen

Learn correct learning rate by simultaneously trying many.

Question: Why is this online-specific?

Structured Orthogonal Random Features

I forget who presented YSCH16.

We present an intriguing discovery related to Random Fourier Features: replacing multiplication by a random Gaussian matrix with multiplication by a properly scaled random orthogonal matrix significantly decreases kernel approximation error. We call this technique Orthogonal Random Features (ORF), and provide theoretical and empirical justification for its effectiveness. Motivated by the discovery, we further propose Structured Orthogonal Random Features (SORF), which uses a class of structured discrete orthogonal matrices to speed up the computation. The method reduces the time cost from \(\mathcal{O}(d^2)\) to \(\mathcal{O}(d log d)\), where d is the data dimensionality, with almost no compromise in kernel approximation quality compared to ORF.

Leads naturally to question: How to manage other types of correlation. How about time series?

Universal Correspondence Network

I forgot who presented CGSC16, which integrates geometric transforms into CNNs in a reasonably natural way:

We present a deep learning framework for accurate visual correspondences and demonstrate its effectiveness for both geometric and semantic matching, spanning across rigid motions to intra-class shape or appearance variations. In contrast to previous CNN-based approaches that optimize a surrogate patch similarity objective, we use deep metric learning to directly learn a feature space that preserves either geometric or semantic similarity.

Cries out for a musical implementation

Weight Normalization: A simple reparameterization to Accelerate Training of Deep Neural Networks

Tim Salimans presents the simplest paper at NIPS, SaKi16:

We present weight normalization: a reparameterization of the weight vectors in a neural network that decouples the length of those weight vectors from their direction. By reparameterizing the weights in this way we improve the conditioning of the optimization problem and we speed up convergence of stochastic gradient descent. Our reparameterization is inspired by batch normalization but does not introduce any dependencies between the examples in a minibatch. This means that our method can also be applied successfully to recurrent models such as LSTMs and to noise-sensitive applications such as deep reinforcement learning or generative models, for which batch normalization is less well suited. Although our method is much simpler, it still provides much of the speed-up of full batch normalization. In addition, the computational overhead of our method is lower, permitting more optimization steps to be taken in the same amount of time.

An elaborate motivation for a conceptually and practically simple way (couple of lines of code) of fixing up batch normalisation.

Relevant sparse codes with variational information bottleneck

Matthew Chalk presents ChMT16.

In many applications, it is desirable to extract only the relevant aspects of data. A principled way to do this is the information bottleneck (IB) method, where one seeks a code that maximises information about a relevance variable, Y, while constraining the information encoded about the original data, X. Unfortunately however, the IB method is computationally demanding when data are high-dimensional and/or non-gaussian. Here we propose an approximate variational scheme for maximising a lower bound on the IB objective, analogous to variational EM. Using this method, we derive an IB algorithm to recover features that are both relevant and sparse. Finally, we demonstrate how kernelised versions of the algorithm can be used to address a broad range of problems with non-linear relation between X and Y.

This one is a cool demo machine.

Dense Associative Memory for Pattern recognition

Dmitry Krotov presents KrHo16:

We propose a model of associative memory having an unusual mathematical structure. Contrary to the standard case, which works well only in the limit when the number of stored memories is much smaller than the number of neurons, our model stores and reliably retrieves many more patterns than the number of neurons in the network. We propose a simple duality between this dense associative memory and neural networks commonly used in models of deep learning. On the associative memory side of this duality, a family of models that smoothly interpolates between two limiting cases can be constructed. One limit is referred to as the feature-matching mode of pattern recognition, and the other one as the prototype regime. On the deep learning side of the duality, this family corresponds to neural networks with one hidden layer and various activation functions, which transmit the activities of the visible neurons to the hidden layer. This family of activation functions includes logistics, rectified linear units, and rectified polynomials of higher degrees. The proposed duality makes it possible to apply energy-based intuition from associative memory to analyze computational properties of neural networks with unusual activation functions - the higher rectified polynomials which until now have not been used for training neural networks. The utility of the dense memories is illustrated for two test cases: the logical gate XOR and the recognition of handwritten digits from the MNIST data set.

Density estimation using Real NVP

Laurent Dinh explains DiSB16:

Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning. Specifically, designing models with tractable learning, sampling, inference and evaluation is crucial in solving this task. We extend the space of such models using real-valued non-volume preserving (real NVP) transformations, a set of powerful invertible and learnable transformations, resulting in an unsupervised learning algorithm with exact log-likelihood computation, exact sampling, exact inference of latent variables, and an interpretable latent space. We demonstrate its ability to model natural images on four datasets through sampling, log-likelihood evaluation and latent variable manipulations.

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Xi Chen presents CCDH16:

This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

Usable parameterizations of GAN by structuring the latent space.

Parameter Learning for Log-supermodular Distributions

Tatiana Shpakova presents ShBa16.

Hack of note:

In order to minimize the expectation[…], we propose to use the projected stochastic gradient method, not on the data as usually done, but on our own internal randomization.

Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates

LiLR16:

Non-negative matrix factorization is a popular tool for decomposing data into feature and weight matrices under non-negativity constraints. It enjoys practical success but is poorly understood theoretically. This paper proposes an algorithm that alternates between decoding the weights and updating the features, and shows that assuming a generative model of the data, it provably recovers the ground- truth under fairly mild conditions. In particular, its only essential requirement on features is linear independence. Furthermore, the algorithm uses ReLU to exploit the non-negativity for decoding the weights, and thus can tolerate adversarial noise that can potentially be as large as the signal, and can tolerate unbiased noise much larger than the signal. The analysis relies on a carefully designed coupling between two potential functions, which we believe is of independent interest.

Time series workshop

Time series workshop home page

Mehryar Mohri Yan Liu Andrew Nobel Inderjit Dhillon Stephen Roberts

Mehryar Mohri presented his online-learning time series analysis using mixtures of experts through empirical discrepancy. He had me up until the model selection phase, when I got lost in a recursive argument. Will come back to this.

Yan Liu - FDA approaches, Hawkes models, clustering of time series. Large section on subspace clustering, which I guess I need to comprehend at some point. Time is special because it reflects the arrow of entropy. Also it can give us a notion of real causality.

Andrew B. Nobel- important of mis-specification in time series models, wrt compounding of the problem over time, increased difficulty of validating assumptions. Time is special because it compounds error. P.s. why not more focus on algorithm failure cases? NIPS conference dynamic doesn’t encourage falsification.

Mohri: time is special because i.i.d is a special case thereof. “Prediction” really is about the future states with these. (How do you do inference of “true models” in his formalism?)

Other guy: Why not use DNN to construct features? How can the feature construction of DNNs be plugged in to Bayesian models. BTW, Bayesian nonparametrics still state of the art for general time series.

High dimensional learning with structure

High dimensional learning with structure page.

Richard Samworth Po-Ling Loh Sahand Negahban Mark Schmidt Kai-Wei Chang Allen Yang Chinmay Hegde Rene Vidal Guillaume Obozinski Lorenzo Rosasco

Several applications necessitate learning a very large number of parameters from small amounts of data, which can lead to overfitting, statistically unreliable answers, and large training/prediction costs. A common and effective method to avoid the above mentioned issues is to restrict the parameter-space using specific structural constraints such as sparsity or low rank. However, such simple constraints do not fully exploit the richer structure which is available in several applications and is present in the form of correlations, side information or higher order structure. Designing new structural constraints requires close collaboration between domain experts and machine learning practitioners. Similarly, developing efficient and principled algorithms to learn with such constraints requires further collaborations between experts in diverse areas such as statistics, optimization, approximation algorithms etc. This interplay has given rise to a vibrant research area.

The main objective of this workshop is to consolidate current ideas from diverse areas such as machine learning, signal processing, theoretical computer science, optimization and statistics, clarify the frontiers in this area, discuss important applications and open problems, and foster new collaborations.

Chinmay Hegde:

We consider the demixing problem of two (or more) high-dimensional vectors from nonlinear observations when the number of such observations is far less than the ambient dimension of the underlying vectors. Specifically, we demonstrate an algorithm that stably estimate the underlying components under general structured sparsity assumptions on these components. Specifically, we show that for certain types of structured superposition models, our method provably recovers the components given merely n = O(s) samples where s denotes the number of nonzero entries in the underlying components. Moreover, our method achieves a fast (linear) convergence rate, and also exhibits fast (near-linear) per-iteration complexity for certain types of structured models. We also provide a range of simulations to illustrate the performance of the proposed algorithm.

This ends up being a sparse recovery for given bases (e.g. dirac deltas plus fourier basis). The interesting problem is recovering the correct decomposition with insufficient incoherence (they have a form for this)

Rene Vidal: “Deep learning is nonlinear tensor factorization”. Various results on tensor factorization, regularized with various norms. They have proofs for a generalized class of matrix factorisations that “Sufficiently wide” factorization matrices do not have local minima. Conclusion: increase size of factorization, in optimisation procedure.

Guillaume Obozinski: hierarchical sprasity penaltys for DAG inference.

Makoto Yamada,Koh Takeuchi, Tomoharu Iwata, John Shawe-Taylor, Samuel Kaski. Localized Lasso for High-Dimensional Regression

Doug Eck

Presents magenta.

Computing with spikes workshop

computing with spikes home page.

Bayesian Deep Learning workshop

Bayesian Deep Learning workshop homepage.

NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop

NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop

Adaptive and Scalable Nonparametric Methods in Machine Learning

Looked solidly amazing, but I was caught up elsewhere:

Adaptive and Scalable Nonparametric Methods in Machine Learning

Brains and Bits: Neuroscience Meets Machine Learning

Especially curious about

Max Welling   Making Deep Learning Efficient Through Sparsification.

Spatiotemporal forecasting

homepage of NIPS workshop on ML for Spatiotemporal Forecasting.

Constructive machine learning

Rus Salakhutdinov

On Multiplicative Integration with Recurrent Neural Networks Yuhuai Wu, Saizheng Zhang, Ying Zhang, Yoshua Bengio, Ruslan R. Salakhutdinov

Constructive machine learning

Most interesting papers

AlHa16
Allen-Zhu, Z., & Hazan, E. (2016) Optimal Black-Box Reductions Between Optimization Objectives. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1606–1614). Curran Associates, Inc.
BHML16
Ba, J., Hinton, G. E., Mnih, V., Leibo, J. Z., & Ionescu, C. (2016) Using Fast Weights to Attend to the Recent Past. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 4331–4339). Curran Associates, Inc.
BhNS16
Bhojanapalli, S., Neyshabur, B., & Srebro, N. (2016) Global Optimality of Local Search for Low Rank Matrix Recovery. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3873–3881). Curran Associates, Inc.
ChMT16
Chalk, M., Marre, O., & Tkacik, G. (2016) Relevant sparse codes with variational information bottleneck. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1957–1965). Curran Associates, Inc.
CCDH16
Chen, X., Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016) InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2172–2180). Curran Associates, Inc.
CGSC16
Choy, C. B., Gwak, J., Savarese, S., & Chandraker, M. (2016) Universal Correspondence Network. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2406–2414). Curran Associates, Inc.
DaMY16
David, O., Moran, S., & Yehudayoff, A. (2016) Supervised learning through the lens of compression. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2784–2792). Curran Associates, Inc.
DiSB16
Dinh, L., Sohl-Dickstein, J., & Bengio, S. (2016) Density estimation using Real NVP. In arXiv:1605.08803 [cs, stat].
ElST16
Ellis, K., Solar-Lezama, A., & Tenenbaum, J. (2016) Sampling for Bayesian Program Learning. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1289–1297). Curran Associates, Inc.
FiGL16
Finn, C., Goodfellow, I., & Levine, S. (2016) Unsupervised Learning for Physical Interaction through Video Prediction. In D. D. Lee, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances In Neural Information Processing Systems 29 (pp. 64–72). Curran Associates, Inc.
FFCE16
Flamary, R., Févotte, C., Courty, N., & Emiya, V. (2016) Optimal spectral transportation with application to music transcription. In arXiv:1609.09799 [cs, stat] (pp. 703–711). Curran Associates, Inc.
FSPW16
Fraccaro, M., Sø nderby, S. ren K., Paquet, U., & Winther, O. (2016) Sequential Neural Models with Stochastic Layers. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2199–2207). Curran Associates, Inc.
GeLM16
Ge, R., Lee, J. D., & Ma, T. (2016) Matrix Completion has No Spurious Local Minimum. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2973–2981). Curran Associates, Inc.
GCPB16
Genevay, A., Cuturi, M., Peyré, G., & Bach, F. (2016) Stochastic Optimization for Large-scale Optimal Transport. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3432–3440). Curran Associates, Inc.
GMDL16
Gruslys, A., Munos, R., Danihelka, I., Lanctot, M., & Graves, A. (2016) Memory-Efficient Backpropagation Through Time. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 4125–4133). Curran Associates, Inc.
HALA16
Haarnoja, T., Ajay, A., Levine, S., & Abbeel, P. (2016) Backprop KF: Learning Discriminative Deterministic State Estimators. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 4376–4384). Curran Associates, Inc.
HaMa16
Hazan, E., & Ma, T. (2016) A Non-generative Framework and Convex Relaxations for Unsupervised Learning. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3306–3314). Curran Associates, Inc.
HoSi16
Horel, T., & Singer, Y. (2016) Maximization of Approximately Submodular Functions. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3045–3053). Curran Associates, Inc.
JDTG16
Jia, X., De Brabandere, B., Tuytelaars, T., & Gool, L. V.(2016) Dynamic Filter Networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 667–675). Curran Associates, Inc.
KaSF16
Kanagawa, M., Sriperumbudur, B. K., & Fukumizu, K. (2016) Convergence guarantees for kernel-based quadrature rules in misspecified settings. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3288–3296). Curran Associates, Inc.
KSJC16
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Chen, X., Sutskever, I., & Welling, M. (2016) Improving Variational Autoencoders with Inverse Autoregressive Flow. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 4736–4744). Curran Associates, Inc.
KrHo16
Krotov, D., & Hopfield, J. J.(2016) Dense Associative Memory for Pattern Recognition. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1172–1180). Curran Associates, Inc.
LDXS16
Li, Yongbo, Dong, W., Xie, X., Shi, G., Li, X., & Xu, D. (2016) Learning Parametric Sparse Models for Image Super-Resolution. In D. D. Lee, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances In Neural Information Processing Systems 29 (pp. 4664–4672). Curran Associates, Inc.
LiLR16
Li, Yuanzhi, Liang, Y., & Risteski, A. (2016) Recovery Guarantee of Non-negative Matrix Factorization via Alternating Updates. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 4988–4996). Curran Associates, Inc.
LiWD16
Lindgren, E., Wu, S., & Dimakis, A. G.(2016) Leveraging Sparsity for Efficient Submodular Data Summarization. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3414–3422). Curran Associates, Inc.
LACL16
Luo, H., Agarwal, A., Cesa-Bianchi, N., & Langford, J. (2016) Efficient Second Order Online Learning by Sketching. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 902–910). Curran Associates, Inc.
NePL16
Neil, D., Pfeiffer, M., & Liu, S.-C. (2016) Phased LSTM: Accelerating Recurrent Network Training for Long or Event-based Sequences. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3882–3890). Curran Associates, Inc.
OHJN16
Ostrovsky, D., Harchaoui, Z., Juditsky, A., & Nemirovski, A. S.(2016) Structure-Blind Signal Recovery. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 4817–4825). Curran Associates, Inc.
PLRS16
Poole, B., Lahiri, S., Raghu, M., Sohl-Dickstein, J., & Ganguli, S. (2016) Exponential expressivity in deep neural networks through transient chaos. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3360–3368). Curran Associates, Inc.
RHDH16
Rae, J., Hunt, J. J., Danihelka, I., Harley, T., Senior, A. W., Wayne, G., … Lillicrap, T. (2016) Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3621–3629). Curran Associates, Inc.
RTHG16
Ritchie, D., Thomas, A., Hanrahan, P., & Goodman, N. (2016) Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs using Neural Networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 622–630). Curran Associates, Inc.
SaWT16
Sadhanala, V., Wang, Y.-X., & Tibshirani, R. J.(2016) Total Variation Classes Beyond 1d: Minimax Rates, and the Limitations of Linear Smoothers. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3513–3521). Curran Associates, Inc.
SGZC16
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X., & Chen, X. (2016) Improved Techniques for Training GANs. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2226–2234). Curran Associates, Inc.
SaKi16
Salimans, T., & Kingma, D. P.(2016) Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 901–901). Curran Associates, Inc.
ScWZ16
Schein, A., Wallach, H., & Zhou, M. (2016) Poisson-Gamma dynamical systems. In Advances In Neural Information Processing Systems (pp. 5006–5014).
ScAB16
Scieur, D., dtextquotesingle Aspremont, A., & Bach, F. (2016) Regularized Nonlinear Acceleration. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 712–720). Curran Associates, Inc.
ShBa16
Shpakova, T., & Bach, F. (2016) Parameter Learning for Log-supermodular Distributions. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3234–3242). Curran Associates, Inc.
SiHF16
Singh, S., Hoiem, D., & Forsyth, D. (2016) Swapout: Learning an ensemble of deep architectures. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 28–36). Curran Associates, Inc.
SiDu16
Sinha, A., & Duchi, J. C.(2016) Learning Kernels with Random Features. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1298–1306). Curran Associates, Inc.
ErKo16
van Erven, T., & Koolen, W. M.(2016) MetaGrad: Multiple Learning Rates in Online Learning. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3666–3674). Curran Associates, Inc.
WWCW16
Wang, J., Wang, W., Chen, xiongtao, Wang, R., & Gao, W. (2016) Deep Alternative Neural Network: Exploring Contexts as Early as Possible for Action Recognition. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 811–819). Curran Associates, Inc.
WaDL16
Wang, X., Dunson, D. B., & Leng, C. (2016) DECOrrelated feature space partitioning for distributed sparse regression. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 802–810). Curran Associates, Inc.
WXYT16
Wang, Y., Xu, C., You, S., Tao, D., & Xu, C. (2016) CNNpack: Packing Convolutional Neural Networks in the Frequency Domain. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 253–261). Curran Associates, Inc.
WWWC16
Wen, W., Wu, C., Wang, Y., Chen, Y., & Li, H. (2016) Learning Structured Sparsity in Deep Neural Networks. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2074–2082). Curran Associates, Inc.
XWGW16
Xin, B., Wang, Y., Gao, W., Wipf, D., & Wang, B. (2016) Maximal Sparsity with Deep Networks?. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 4340–4348). Curran Associates, Inc.
YSCH16
Yu, F. X., Suresh, A. T., Choromanski, K. M., Holtmann-Rice, D. N., & Kumar, S. (2016) Orthogonal Random Features. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 1975–1983). Curran Associates, Inc.
YuRD16
Yu, H.-F., Rao, N., & Dhillon, I. S.(2016) Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction. In D. D. Lee, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances In Neural Information Processing Systems 29 (pp. 847–855). Curran Associates, Inc.
YLZL16
Yuan, X., Li, P., Zhang, T., Liu, Q., & Liu, G. (2016) Learning Additive Exponential Family Graphical Models via textbackslash ell_lbrace 2,1rbrace -norm Regularized M-Estimation. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 4367–4375). Curran Associates, Inc.
ZLLG16
zhang, matt, Lin, P., Lin, P., Guo, T., Wang, Y., Wang, Y., & Chen, F. (2016) Infinite Hidden Semi-Markov Modulated Interaction Point Process. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 3900–3908). Curran Associates, Inc.
ZhLi16
Zhang, H., & Liang, Y. (2016) Reshaped Wirtinger Flow for Solving Quadratic System of Equations. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 29 (pp. 2622–2630). Curran Associates, Inc.