The Living Thing / Notebooks :

Semi/weakly-supervised learning

On extracting nutrition from bullshit

Usefulness: 🔧
Novelty: 💡
Uncertainty: 🤪 🤪 🤪
Incompleteness: 🚧 🚧 🚧

I’m not yet sure what this is, but I’ve seen these words invoked in machine learning problems with a partially-observed model, where you hope to simultaneously learn the parameters of the label generation process and the observation process: So if I have a bunch of crowd-sourced labels for my data and I wish to use them to train a classifier, but I suspect that my crowd is a little unreliable, then I try to do “weakly supervised” learning when I learn both the true labels and the crowd whimsy process, as a kind of hierarchical model of informative sampling. Or I might assume no explicit model for the crowd whimsy, but simply that similar data should not be too differently labelled, a.k.a. Label Propagation, which uses graph clustering to infer data labels.

Other methods?

Here’s one practical thingy:

snorkel:

Snorkel is a system for rapidly creating, modeling, and managing training data, currently focused on accelerating the development of structured or “dark” data extraction applications for domains in which large labeled training sets are not available or easy to obtain.

Today’s state-of-the-art machine learning models require massive labeled training sets – which usually do not exist for real-world applications. Instead, Snorkel is based around the new data programming paradigm, in which the developer focuses on writing a set of labeling functions, which are just scripts that programmatically label data. The resulting labels are noisy, but Snorkel automatically models this process—learning, essentially, which labeling functions are more accurate than others—and then uses this to train an end model (for example, a deep neural network in TensorFlow).

Surprisingly, by modeling a noisy training set creation process in this way, we can take potentially low-quality labeling functions from the user, and use these to train high-quality end models. We see Snorkel as providing a general framework for many weak supervision techniques, and as defining a new programming model for weakly-supervised machine learning systems.

Refs

Bach, Stephen H., Bryan He, Alexander Ratner, and Christopher Ré. 2017. “Learning the Structure of Generative Models Without Labeled Data.” In Proceedings of the 34th International Conference on Machine Learning. International Conference on Machine Learning, Sydney, Australia. http://arxiv.org/abs/1703.00854.

Delalleau, Olivier, Yoshua Bengio, and Nicolas Le Roux. 2005. “Efficient Nonparametric Function Induction in Semi-Supervised Learning.” In In Proc. Artificial Intelligence and Statistics. Citeseer. http://www.iro.umontreal.ca/~lisa/bib/pub_subject/unsupervised/pointeurs/semisup_aistats2005.pdf.

Fonseca, Eduardo, Manoj Plakal, Daniel P. W. Ellis, Frederic Font, Xavier Favory, and Xavier Serra. 2019. “Learning Sound Event Classifiers from Web Audio with Noisy Labels,” January. http://arxiv.org/abs/1901.01189.

Jung, Alexander, Alfred O. Hero III, Alexandru Mara, and Saeed Jahromi. 2016. “Semi-Supervised Learning via Sparse Label Propagation,” December. http://arxiv.org/abs/1612.01414.

Karpathy, Andrej, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. “Large-Scale Video Classification with Convolutional Neural Networks.” In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 1725–32. CVPR ’14. Washington, DC, USA: IEEE Computer Society. https://doi.org/10.1109/CVPR.2014.223.

Kumar, Anurag, and Bhiksha Raj. 2016. “Audio Event Detection Using Weakly Labeled Data.” In Proceedings of the 2016 ACM on Multimedia Conference, 1038–47. MM ’16. New York, NY, USA: ACM. https://doi.org/10.1145/2964284.2964310.

———. 2017. “Deep CNN Framework for Audio Event Recognition Using Weakly Labeled Web Data,” July. http://arxiv.org/abs/1707.02530.

Li, Z., and J. Tang. 2015. “Weakly Supervised Deep Metric Learning for Community-Contributed Image Retrieval.” IEEE Transactions on Multimedia 17 (11): 1989–99. https://doi.org/10.1109/TMM.2015.2477035.

Misra, Ishan, C. Lawrence Zitnick, Margaret Mitchell, and Ross Girshick. 2015. “Seeing Through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels.” In Proceedings of CVPR. http://arxiv.org/abs/1512.06974.

Papandreou, George, Liang-Chieh Chen, Kevin Murphy, and Alan L. Yuille. n.d. “Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation.” Accessed July 18, 2017. http://www.cs.jhu.edu/~ayuille/Pubs15/PapandreouChen_WeaklySemiSupervised_v2%20(1).pdf.

Ratner, Alexander, Stephen H. Bach, Henry Ehrenberg, Jason Fries, Sen Wu, and Christopher Ré. 2017. “Snorkel: Rapid Training Data Creation with Weak Supervision.” Proceedings of the VLDB Endowment 11 (3): 269–82. https://doi.org/10.14778/3157794.3157797.

Ratner, Alexander J, Christopher M De Sa, Sen Wu, Daniel Selsam, and Christopher Ré. 2016. “Data Programming: Creating Large Training Sets, Quickly.” In Advances in Neural Information Processing Systems 29, edited by D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, 3567–75. Curran Associates, Inc. http://papers.nips.cc/paper/6523-data-programming-creating-large-training-sets-quickly.pdf.

Varma, Paroma, Bryan He, Payal Bajaj, Imon Banerjee, Nishith Khandwala, Daniel L. Rubin, and Christopher Ré. 2017. “Inferring Generative Model Structure with Static Analysis.” In Advances in Neural Information Processing Systems. http://arxiv.org/abs/1709.02477.

Wu, F., Z. Wang, Z. Zhang, Y. Yang, J. Luo, W. Zhu, and Y. Zhuang. 2015. “Weakly Semi-Supervised Deep Learning for Multi-Label Image Annotation.” IEEE Transactions on Big Data 1 (3): 109–22. https://doi.org/10.1109/TBDATA.2015.2497270.

Zhou, Dengyong, Olivier Bousquet, Thomas Navin Lal, Jason Weston, and Bernhard Schölkopf. 2003. “Learning with Local and Global Consistency.” In Proceedings of the 16th International Conference on Neural Information Processing Systems, 321–28. NIPS’03. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/2506-learning-with-local-and-global-consistency.pdf.

Zhu, Xiaojin, and Zoubin Ghahramani. 2002. “Learning from Labeled and Unlabeled Data with Label Propagation.” http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf.