Natural language processing

Automatic processing of words and sentences and such

January 11, 2018 — September 16, 2021

grammar

language

machine learning

NLP

stringology

Computational language translation, parsing, search, generation and understanding.

A mare’s nest of intersecting computational philosophical and mathematical challenges (e.g. semantics, grammatical inference, language complexity, learning theory) that humans seem to be able to handle subconsciously and which we therefore hope to train machines on. Moreover it is a problem of great commercial benefit so it is likely we can muster the resources to tackle it. The interesting thing right now is the NLP explosion, where it looks like if anything has a good chance of producing artificial general intelligence it might be neural NLP, where certain architectures (especially highly evolved attention mechanisms) are producing eerily good results (Brown et al. 2020).

1 What is Natural Language Processing?

Sebastian Ruder, Recent history of NLP a.k.a “how natural language processing turned into a deep learning thing too”
See also Sebastian’s newsletter
Peter Norvig on Chomsky and statistical versus explanatory models of natural language syntax. Full of sick burns.
I guess the famous Stochastic Parrots paper (Bender et al. 2021) is a new kind of rejoinder, with a particular focus on transformers

2 Biological basis of language

See biology of language.

3 Software

See NLP software.

4 References

Angluin. 1988. “Identifying Languages from Stochastic Examples.” No. YALEU/DCS/RR-614.

Arisoy, Sainath, Kingsbury, et al. 2012. “Deep Neural Network Language Models.” In Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-Gram Model? On the Future of Language Modeling for HLT. WLM ’12.

Autebert, Berstel, and Boasson. 1997. “Context-Free Languages and Pushdown Automata.” In Handbook of Formal Languages, Vol. 1.

Baeza-Yates, and Ribeiro-Neto. 1999. Modern Information Retrieval.

Bail. 2016. “Combining Natural Language Processing and Network Analysis to Examine How Advocacy Organizations Stimulate Conversation on Social Media.” Proceedings of the National Academy of Sciences.

Bender, Gebru, McMillan-Major, et al. 2021. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜.” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.

Bengio, Ducharme, Vincent, et al. 2003. “A Neural Probabilistic Language Model.” Journal of Machine Learning Research.

Berstel, and Boasson. 1990. “Transductions and Context-Free Languages.” In Handbook of Theoretical Computer Science, Vol. A: Algorithms and Complexity.

Blazek, and Lin. 2020. “A Neural Network Model of Perception and Reasoning.” arXiv:2002.11319 [Cs, q-Bio].

Bolhuis, Tattersall, Chomsky, et al. 2014. “How Could Language Have Evolved?” PLoS Biol.

Booth, and Thompson. 1973. “Applying Probability Measures to Abstract Languages.” IEEE Transactions on Computers.

Bottou. 2011. “From Machine Learning to Machine Reasoning.” arXiv:1102.1808 [Cs].

Brown, Mann, Ryder, et al. 2020. “Language Models Are Few-Shot Learners.” arXiv:2005.14165 [Cs].

Casacuberta, and de la Higuera. 2000. “Computational Complexity of Problems on Probabilistic Grammars and Transducers.” In Grammatical Inference: Algorithms and Applications.

Charniak. 1996. Statistical Language Learning.

Chater, and Manning. 2006. “Probabilistic Models of Language Processing and Acquisition.” Trends in Cognitive Sciences.

Cho, van Merrienboer, Gulcehre, et al. 2014. “Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation.” In EMNLP 2014.

Clark, Alexander, and Eyraud. 2005. “Identification in the Limit of Substitutable Context-Free Languages.” In Algorithmic Learning Theory. Lecture Notes in Computer Science.

Clark, Alexander, Florêncio, and Watkins. 2006. “Languages as Hyperplanes: Grammatical Inference with String Kernels.” In Machine Learning: ECML 2006. Lecture Notes in Computer Science 4212.

Clark, Alexander, Florêncio, Watkins, et al. 2006. “Planar Languages and Learnability.” In Grammatical Inference: Algorithms and Applications. Lecture Notes in Computer Science 4201.

Clark, Peter, Tafjord, and Richardson. 2020. “Transformers as Soft Reasoners over Language.” In IJCAI 2020.

Collins, and Duffy. 2002. “Convolution Kernels for Natural Language.” In Advances in Neural Information Processing Systems 14.

Gold. 1967. “Language Identification in the Limit.” Information and Control.

Gonzalez, and Thomason. 1978. Syntactic Pattern Recognition: An Introduction.

Grefenstette, Hermann, Suleyman, et al. 2015. “Learning to Transduce with Unbounded Memory.” arXiv:1506.02516 [Cs].

Greibach. 1966. “The Unsolvability of the Recognition of Linear Context-Free Languages.” J. ACM.

Hopcroft, and Ullman. 1979. Introduction to Automata Theory, Languages and Computation.

Khalifa, Barros, and Togelius. 2019. “DeepTingle.”

Kontorovich, Leonid, Cortes, and Mohri. 2006. “Learning Linearly Separable Languages.” In Algorithmic Learning Theory. Lecture Notes in Computer Science 4264.

Kontorovich, Leonid (Aryeh), Cortes, and Mohri. 2008. “Kernel Methods for Learning Languages.” Theoretical Computer Science, Algorithmic Learning Theory,.

Lafferty, McCallum, and Pereira. 2001. “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.” In Proceedings of the Eighteenth International Conference on Machine Learning. ICML ’01.

Lamb, Garcez, Gori, et al. 2020. “Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective.” In IJCAI 2020.

Lipton, Berkowitz, and Elkan. 2015. “A Critical Review of Recurrent Neural Networks for Sequence Learning.” arXiv:1506.00019 [Cs].

Manning. 2002. “Probabilistic Syntax.” In Probabilistic Linguistics.

Manning, Raghavan, and Schütze. 2008. Introduction to Information Retrieval.

Manning, and Schütze. 1999. Foundations of Statistical Natural Language Processing.

Mikolov, Tomáš, Karafiát, Burget, et al. 2010. “Recurrent Neural Network Based Language Model.” In Eleventh Annual Conference of the International Speech Communication Association.

Mikolov, Tomas, Le, and Sutskever. 2013. “Exploiting Similarities Among Languages for Machine Translation.” arXiv:1309.4168 [Cs].

Mitra, and Craswell. 2017. “Neural Models for Information Retrieval.” arXiv:1705.01509 [Cs].

Mohri, Pereira, and Riley. 1996. “Weighted Automata in Text and Speech Processing.” In Proceedings of the 12th Biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended Finite State Models of Language.

———. 2002. “Weighted Finite-State Transducers in Speech Recognition.” Computer Speech & Language.

O’Donnell, Tenenbaum, and Goodman. 2009. “Fragment Grammars: Exploring Computation and Reuse in Language.”

Pennington, Socher, and Manning. 2014. “GloVe: Global Vectors for Word Representation.” Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014).

Petersson, Folia, and Hagoort. 2012. “What Artificial Grammar Learning Reveals about the Neurobiology of Syntax.” Brain and Language, The Neurobiology of Syntax,.

Pillutla, Liu, Thickstun, et al. 2022. “MAUVE Scores for Generative Models: Theory and Practice.”

Qi, Zhang, Zhang, et al. 2020. “Stanza: A Python Natural Language Processing Toolkit for Many Human Languages.” arXiv:2003.07082 [Cs].

Salakhutdinov. 2015. “Learning Deep Generative Models.” Annual Review of Statistics and Its Application.

Schlag, and Schmidhuber. 2019. “Learning to Reason with Third-Order Tensor Products.” arXiv:1811.12143 [Cs, Stat].

Solan, Horn, Ruppin, et al. 2005. “Unsupervised Learning of Natural Languages.” Proceedings of the National Academy of Sciences of the United States of America.

Sutton, McCallum, and Rohanimanesh. 2007. “Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data.” Journal of Machine Learning Research.

van Rijsbergen. 1979. Information Retrieval.

Wetherell. 1980. “Probabilistic Languages: A Review and Some Open Questions.” ACM Comput. Surv.

Wolff. 2000. “Syntax, Parsing and Production of Natural Language in a Framework of Information Compression by Multiple Alignment, Unification and Search.” Journal of Universal Computer Science.