A Formal Perspective on Language Modeling
Language models—especially the large ones—are all the rage. And, for what will surely be one of only a few times in history, my field, natural language processing, is the center of world attention. Indeed, there is nearly a daily stream of articles in the popular press on the most recent advances in language modeling technology. In contrast to most of these articles (and most other talks on the topic), this tutorial-style presentation is not about forward progress in the area. Instead, I am going to take a step back and ask simple questions about the nature of language modeling itself. We will start with the most basic of questions: From a mathematical perspective, what is a language model? Next, the talk will turn philosophical. With all the talk of artificial general intelligence, what can theory of computation bring to bear on the computational power of language models? The talk will conclude with a statement of several recent theorems proven by my research group, the highlight of which is that no Transformer-based language model is Turing complete and, thus, we should be careful about labeling such language models, e.g., GPT-4, as general-purpose reasoners.
Language models—especially the large ones—are all the rage. And, for what will surely be one of only a few times in history, my field, natural language processing, is the center of world attention. Indeed, there is nearly a daily stream of articles in the popular press on the most recent advances in language modeling technology. In contrast to most of these articles (and most other talks on the topic), this tutorial-style presentation is not about forward progress in the area. Instead, I am going to take a step back and ask simple questions about the nature of language modeling itself. We will start with the most basic of questions: From a mathematical perspective, what is a language model? Next, the talk will turn philosophical. With all the talk of artificial general intelligence, what can theory of computation bring to bear on the computational power of language models? The talk will conclude with a statement of several recent theorems proven by my research group, the highlight of which is that no Transformer-based language model is Turing complete and, thus, we should be careful about labeling such language models, e.g., GPT-4, as general-purpose reasoners.
Speakers
Ryan Cotterell is a leading researcher in computational linguistics and natural language processing. Ryan has a bachelor’s degree and a PhD in Computer Science from Johns Hopkins University. He is currently a tenure-track assistant professor at ETH Zürich in the Department of Computer Science, where he is a member of the Institute for Machine Learning. He was previously a lecturer at Cambridge University and has done research stints at Google AI and Facebook AI Research.