Let's get started. Welcome to CS224U. Our topic is Natural Language Understanding. And if you can understand the words that I am saying right now, you are already pretty good at natural language understanding. Uh, this might even be an easy class for you. But you probably have a device in your pocket that's not very good at natural language understanding.
And that's the problem that we want to tackle in this class. Uh, but first let me introduce myself. My name is Bill MacCartney. I'm an Adjunct Professor in the Computer Science Department. Adjunct Professor basically means part-time professor. I've been teaching this class for eight years now, but my full-time work is in industry. Um, currently, I'm in Apple where I help to lead part of the AI and Machine Learning organization. I was previously a research scientist at Google, and my focus at Google was natural language understanding for search and for the Google Assistant. Uh, let me also introduce my co-instructor Chris Potts.
You wanna say anything? Chelsea will probably get me later. So [OVERLAPPING]. Okay Lots of familiar faces it's great to see. So Chris and I are gonna share the teaching responsibilities, uh, today. You'll mainly hear from me and on Wednesday, Chris will take the stage. So the first question I wanna look at is why study natural language understanding? What are our goals in studying natural language understanding? And a few possibilities come to mind. One is that it may yield insights into human cognition. Uh, human language is phenomenally complex and understanding natural language can be surprisingly difficult. If we can figure out how to enable machines to understand language, it might help us to understand how humans understand language. So this amounts to building a model of a phenomenon in order to understand it better. Uh, another goal might be to build conversational agents that can provide assistance or, uh, entertainment or companionship. So for example, you might wanna build a natural language interface to an airline reservation system. Or you might wanna build a toy that can have a conversation with a child.
Or you might wanna build a virtual companion like in the movie Her. Uh, another possible goal is that solving NLU would mean solving a major subproblem of Artificial Intelligence. Uh, after all, you can't really claim that a system has achieved human level intelligence if it can't understand language. So NLU is a necessary step on a path to strong AI. So those are a few possible motivations. I'm curious to hear if you guys have other ideas.
Are there other motivations that have brought you to this class? Why are you interested in studying NLU? [NOISE] Yep. I'm mainly interested in using it to help with historical research. So applying it to bodies of [inaudible]. That's terrific, there's been an explosion in the last few years of computational social science. So using computational methods to gain better insights into topics that are traditionally not part of computer science, but part of social science. So it's a scientific goal, but it's not aimed at understanding how humans understand language. It's using NLU as a tool to study something else.
Any other ideas? Yep. I'm here because I feel like, uh, NLP is not satisfying enough. [NOISE] I find like machine translation understands is like, it like, like the computer doesn't understand language. It's just like, even with Internet work. So I was just wondering how we could push that further and see if we use [inaudible] like make it better. Yeah. Many of the, many of the goals that NLP has aimed at over the last couple of decades are comparatively modest goals. So not aimed at actually understanding language in its fullness, but instead aimed at lower level tasks like, um, tagging tasks, or syntactic analysis, or even translation which is a comparatively harder and higher-level task, but still falls short of full understanding [NOISE]. Any other thoughts before I go on? Yeah. Helping older people with technology through natural language understanding. Yeah. Can you say a little more about that? Uh, actually as people get older their memory fades, they start to hit other issues.
[NOISE] Talking becomes harder, so having some form of companion. Someone who can understand how to communicate with you, to help better your condition, something like that. Yeah. Um, so so maybe a combination of assistance and- and companionship. I was just going to say you might need to repeat the questions, so it can pick it up. Okay. [inaudible]. Yeah. So the so that comment was about, uh, providing assistance and companionship, companionship to older people, uh, to help them with their lives and and, uh, help them not feel so alone. Great. Okay. Let me keep going. Um, think I skipped a slide. Oh, uh, I was gonna make an observation first which is that, um, some of the goals we've talked about are primarily scientific goals, and others are primarily technological goals. So maybe Number 1 is a scientific goal. Number 2, I would call it a technological goal, trying to create a new piece of technology. Number 3, may be ambiguous between the two. But this split between, uh, scientific goals and technological goals is one that's been a part of NLU for a long time. And in fact you see it in this quote from James Allen, who's the author of one of the earliest texts on natural language understanding.
He says, "There can be two underlying motivations for building a computational theory. The technological goal is simply to build better computers and any solution that works would be acceptable. The cognitive goal is to build a computational analog of the human-language-processing system. Such a theory will be acceptable only after it had been verified by experiment." So the question is when we build a natural language understanding system, do we care whether it understands language in the same way as a human, that is using similar techniques? If we're cognitive scientists, then of course we do care. Uh, if we're technologists then we probably don't. We asked you to read this short paper by Hector Levesque for today, and he makes the same distinction between scientific goals and technological goals. He points out that in some cases, you can achieve the technological goals of Artificial Intelligence through what he calls cheap tricks. And that pejorative term makes it clear that what he cares about are the scientific goals, not the technological goals.
Uh, in this course, we're gonna approach the topic of natural language understanding primarily from the technologist's perspective. But I think it's useful to keep this other perspective in mind. Another important question for us is, what counts as understanding? If we're building a natural language understanding system, how do we know whether we've succeeded? How do we know whether our system truly understands? Um, over the years, many different criteria have been proposed. One possibility is to say that I understand a statement if I can specify its truth conditions. So if I'm gonna claim to understand a statement like, uh, every amino acid contains nitrogen, then I need to be able to state what has to be true in order for that to be so. So this seems like a plausible criterion for understanding, any reservations about that? Any- anything wrong with that as a criterion for understanding? Yes. [inaudible]. Yeah there's opinions. There's lots of other kinds of statements that don't really have truth conditions too. Like if I say a great example of an opinion, if I say like, hey did you see Saturday Night Live this weekend? It was the funniest thing I'd ever seen.
Like how would I- that's meaningful. You understand it. You understand my meaning but how would I specify the truth conditions for- that's the funniest thing I ever saw. It's hard to imagine. Any other thoughts about this as a criterion of understanding? Yes? This might be going off the course but like you may argue that, um, an AI would never understand certain emotional or human condition statements such as nurtured.
Yeah. If I say, ta- I take a statement like, um, love is the most important thing in life. What are the truth conditions for that? I wouldn't have the foggiest idea how to specify the truth conditions for that. And yet, it's perfectly meaningful. And we can talk about understanding what that means. Another possible criterion is to say that we understand a statement if we can calculate its entailments. This is closely related to the first one, but we're shifting our attention from the conditions of the statement to its consequences. Later in the quarter, we're gonna look at the task of natural language inference which is one way of operationalizing that criterion. Another possibility is to say that, to understand a statement is to respond appropriately. To take appropriate action based on the statement.
So this might be relevant. For example, if you're playing, um, if you're playing a text-based adventure game like, uh, the kinda game where you wanna be able to type commands like go North, or take gold, or stab dwarf repeatedly. Or you know, if you're talking to your iPhone and you just wanna be able to say set an alarm for 9:00 PM. Does it do the right thing? Does it take the right action? Another possibility- another possible criterion for understanding is, uh, to understand it, to be able to translate into another language. This seems like a high bar for understanding. Maybe not, um, maybe not a necessary condition, but perhaps a sufficient condition. What do you guys think? If you're building a natural language understanding system, how will you evaluate whether it understands? How will you- how will you judge whether it actually works? This question of evaluation is gonna be one that we're gonna look at a lot more later this quarter. Other thoughts here? Yeah? This isn't necessarily a task for like a computational system but from a human perspective, I think an important way to understand this statement is to empathize with the person making the statement based on what the statement means.
Yes. So your suggestion is that, uh, an important test of- of understanding is the ability to emp- empathize with the statement. Um, I don't know how we would operationalize that as a- as a measure of computer understanding of a system. But I think it's a good insight. Yeah? I want to follow on that. I agree. Because I think that's, uh, one of the biggest problems for like mental health chatbots. Yeah. Where is the lack of, uh, be able to empathize or respond appropriately, um, based on like nuanced responses, depending who's there. Yeah. And that's like a big- that's a big problem in those. Yeah. So it seems closely related to take appropriate action in light of it or respond appropriately. Um, if you had a chat bot that's trying to, um, uh, maybe almost play the role of therapist, you wanna make sure that your chat bot is, um, responding in a way that shows real understanding and real empathy. Uh, and right now, that's a- that's a pretty high bar.
A challenge that- that seems pretty- pretty, um, far out of reach at this point. Yes? Um, maybe a way to think of understanding is to be able to create a graph that has like concepts as- as nodes and then draws and relations in between them. To see like how, uh, to sort of like not just like- like climb the tree or anything, but just understand I guess like what it means. Um, how these different objects interact with one another, uh, just like as a concept, I don't know if that was a fair, all of it.
Yeah. So to- to- I just- I'm just gonna try to repeat it back. Build a graph of concepts and relationships between them, based on that understanding. Based on the statement that was given? Based on the statement that was given. Yeah. And in fact, one of the topics that we're gonna look at this quarter is the topic of relation extraction, which is precisely about taking a piece of natural language text like a web page, and identifying the entities that are mentioned in the text and the relationships between them. And one of the applications for this is to automatically construct knowledge graphs from text on the web at scale. So that we can learn about, you know, um, uh, the fact that a certain album came from a certain band, and the band originated in a certain city, and it got started in a certain year.
And we can learn all this from some, you know, fan page that somebody wrote or a review or something like that. Okay. So a bunch of good ideas about, uh, criteria for understanding. This question of understanding has been at the heart of philosophical debates, uh, about artificial intelligence for many years. Uh, quick show of hands, how many people in the room are familiar with the Turing test? Okay. That is reassuring. I would've been alarmed if it were not a high proportion. How many people are familiar with, uh, Searle's Chinese Room argument? Okay. So maybe half. [NOISE] Um, so Turing's position is essentially a behaviorist position. He says, that a system is intelligent if its behavior is indistinguishable from that of an intelligent human being.
For our purposes, we might say that a system truly understands if it behaves as if it understands. In other words, behavior is, uh, behavior itself is enough to demonstrate understanding. Searle rebutted by saying, that even if a system behaves as if it understands, it might not truly understand. And he asks you to imagine that you're locked in a room, and you're passed questions in Chinese through a little slot in the door. And you have to respond in Chinese and pass the answers back out through this slot. And the problem is, that you don't speak any Chinese, but you have in the room with you thousands and thousands of books that specify in exhaustive detail an algorithm that you can follow to generate the answers. So you're essentially playing the role of a human emulator of a Chinese natural language understanding system. Now, Searle's argument is that by executing this algorithm, you might succeed in fooling people outside the room that you understand Chinese.
But in fact, you don't understand at all. He says, "Behavior is not proof of understanding." So what do you think about this argument? Do you buy that? Is it plausible? Does it stand up? Thoughts on this? Yeah. Um, it sets a pretty high bar if it's correct because if we continue with like our architectures you always have some tiny little Turing machine. And that's never going to have a state that fully encapsulates like the Chinese language or any other concave system. It's just going to be traversing a big algorithm. Um, so I would go with Turing. Yep. Did you say- did use the phrase tangled Turing system? No, I said I would go with the Turing [inaudible]. Oh, Um, I misunderstood you. Okay. Yes? I also feel like that assumes that like we know what it means to understand as humans. Like what if we get an algorithm in our head that we're just kind of like following. So like, it's just hard to separate like the algorithm from- that- that- we only know what understanding- he hasn't really redefined what understanding is specifically, he sort of left that vague.
That's right. Uh, Turing gives, um, Turing gives this very behaviorist definition of what understanding is. He's basically saying, understanding is just behaving as if you understand. Yeah. And Searle rebuts that, but he doesn't replace it with any other definition of what constitutes understanding. Yes. I feel like as humans, we also extrapolate like going off of that. Uh, if an invariable, uh, algorithm's not that standard of extrapolation. Yeah, as humans, we extrapolate a lot. Actually, can you say a little bit more about that? Yeah. So there's like concepts that are related to something that we don't know. We might guess, or come up with, uh, a definition that might be close to the right definition, and an algorithm's doing the same thing. It's pretty [inaudible]. Yeah. Okay. Yes. So my question is, uh, how do you define human understanding are we trying to supersede human understanding? Like surpass human understanding? Because I don't even know that we understand how humans think at the moment.
So [inaudible]. Yeah, uh, so I think what you're really getting at is- is human understanding is- is the way that human understands necessarily the goal. Or maybe there can be other kinds of understanding which are not the same as human understanding but are still, um, important and valuable. Um, and that's a nice segue to, uh, this quote from the eminent linguist Noam Chomsky. He says, "The question of whether a computer is playing chess or doing long division or translating Chinese is like the question of whether robots can murder or airplanes can fly.
Blah blah blah blah blah. Uh, these are questions of decision, not fact; decision as to whether to adopt a certain metaphoric extension of common usage." So he basically seems to be saying that, in fact, the- there's no fact of the matter to debate here. It's just a question of how we choose to talk about it. It's just a question of how we choose to use the word understand. And I think more and more people feel comfortable using the word understand to talk about computers. It may not be the same as human understanding. But we're still okay calling it understanding and it's still something which can be valuable and useful and can help us achieve, uh, applications. Okay, I think that's the end of my philosophy for today. You have to indulge me for a little bit of philosophy at the beginning of this class.
As an undergrad I was actually a philosophy major. So I like to wax a little philosophical now and then. We're going to turn now from philosophy to history. Uh, and here's a nutshell history of natural language understanding. So in the early days of computing, uh, NLU was front and center. Complete, precise understanding of human language was seen as a central goal of artificial intelligence. And there was actually quite a bit of optimism, that it would soon be achieved. And there were some impressive early successes which- which led to this optimism. For example, there was a system called SHRDLU, uh, which was created by Terry Winograd who some of you may know.
He's now a professor, he wasn't then, but he is now a professor here at Stanford. Uh, and SHRDLU allowed you to have a conversation about a simplified blocks world. But the systems that were developed at that time tended to be quite narrow in scope and their understanding was very brittle. So if you said something one way, it might understand. But if you said the same thing using slightly different words, it might not understand at all, uh, and would completely fall down. In the 1990s came the so-called statistical revolution in NLP and this brought some important new tools like data and machine learning. Uh, but it also brought a general retreat from the higher level ambitions of natural language understanding and instead the focus shifted to lower level tasks, like part of speech tagging, and syntactic parsing, and things like that.
And then about 10 years ago, we saw this big resurgence of interest in natural language understanding. Um, now leveraging new machine learning techniques and vastly greater computational resources. And starting about five years ago, um, deep learning methods have completely transformed natural language understanding. So starting with, um, neural embedding methods like Word2Vec, which is not actually deep learning but kinda gets bucketed together with them. And then moving on to things like LSTMs, and sequence to sequence models, and many other flavors. And now trained on vastly larger datasets than had ever been available before. Uh, and so that means that this is an incredibly exciting time to be working in natural language understanding. So in academia, there's been this resurgence of interest after many years in the wilderness. And in the commercial world there's a growing sense that natural language understanding is on the verge of breaking through as a mainstream technology.
And over the last few years there's been an explosion of, uh, businesses and services that are based on natural language understanding. So things like Siri and the Google Assistant, and Amazon Alexa, and Microsoft Cortana, and Samsung's Bixby, and many more. Um, and many people see conversational agents as being a key battleground between the tech giants over the coming years. Uh, and by the way that means that Stanford grads with expertise in NLU are in very high demand.
So where is the state of the art today? Well, today's best NLU systems are sometimes impressive but they have severe limitations which quickly become apparent. And I'll show you a bunch of examples of those limitations in a few minutes. Um, and I think what this shows is that NLU is far from a solved problem. There's still a lot of discoveries waiting to be made. And for me as a scientist and a technologist, uh, that means that NLU offers this irresistible combination. On the one hand, there's a lot of interest in NLU, uh, there's great demand for it. It's ready to make a huge impact. But on the other hand, even though we're making rapid progress, like it still doesn't really work yet. And there's still so- there's still a lot of discoveries that are waiting to be made. There's still a lot of gold in the ground. And that's really exciting. Uh, one of the most visible NLU applications today is Siri, the personal assistant on your iPhone.
Um, despite growing competition, it is still by far the most used virtual assistant with half-a-billion active users. Um, when it first appeared in 2011, some even described it as representing a breakthrough in artificial intelligence, and as defining the next generation of interaction design. Even if you think that, uh, those claims are just a little bit inflated, there's no question that Siri has been tremendously impactful. And over the last few years, we've seen Google, and Amazon, and Microsoft, and others develop similar products, and add some degree of natural language understanding not only to phones, but to watches, and TVs, and cars, and now, smart speakers. So how do conversational agents like Siri work? Well, if the input modality is speech, then we start with automatic speech recognition. And as you probably know, uh, the accuracy of speech recognition systems has increased dramatically over the last few years.
Thanks to the application of neural networks and deep learning methods. Um, you might think based on the name, that natural language understanding would include speech recognition. But historically, the two fields have been quite separate, and the term NLU is usually restricted to understanding of written input, that is understanding of text. Uh, of course, in some use cases, the input is already written or typed. So we can skip the automatic speech recognition. On the output side, uh, we often have a text-to-speech module or TTS module. So that's not NLU either. The NLU component sits in the middle, and it's often surprisingly simplistic. We typically start with pre-processing by some standard NLP tools such as part-of-speech tagging, and named entity recognition and resolution. And then the heart of it is an interpreter that generates some kind of meaning representation as output.
So, that's a machine readable representation of the semantics of the request. And then that meaning representation is passed to a service manager which can make calls to internal and external APIs to execute the request. Now, there's a lot of room for variation here. Um, the meaning representation can be discrete or it can be continuous. Uh, the interpreter can be manually engineered or it can be learned from data or most commonly, it's some combination of those two, and in this course we'll take a look at a lot of those design choices. Now, there are a lot of interactions that you'd really like to be able to have which are just beyond the capabilities of today's conversational agents. So here's an example dialogue. Uh, you say, "Where's Black Panther playing in Mountain View?" and the agent says, "Black Panther is playing at the Century 16 Theatre." "When is it playing there?" "It's playing at 2:00 PM, 5:00 PM.
And 8:00 PM." "Okay. I'd like one adult and two children for the first show. How much would that cost?" This would be incredibly useful, but it's also incredibly hard. Let's take a minute to dissect exactly what makes this hard. And I'm gonna focus just on one aspect of understanding here. I'm gonna focus just on reference resolution. So this is the problem of understanding what the various expressions refer to. In order to do that, we're going to need a bunch of knowledge. We're going to need, first of all, domain knowledge because we need to figure out that Black Panther refers to a movie, and Mountain View refers to a town. We also need discourse knowledge because we need to figure out that it refers to Black Panther, and there, refers to the Century 16 Theater, and that, refers to the tickets.
And we're gonna need world knowledge because we need to understand that people don't usually buy adults and children. So the user is talking about buying adult and child tickets. And also, we need to figure out that the first show refers to 2:00 PM, because 2:00 PM is before 5:00 PM and 8:00 PM. So there's a lot going on here in what- at first glance, appears to be a relatively simple dialogue. There's a lot of different kinds of knowledge that we need to bring to bear in order to execute this conversation successfully. So just for fun, let's see how Siri actually does with this dialogue in real life.
So these are some screenshots that I captured last year when Black Panther came out, and, uh, I'm gonna try to recreate the interaction that we had on the previous screen, and we'll see how Siri does with this. So I start off by saying, "Where's Black Panther playing in Mountain View?" And Siri says, "Here's Black Panther playing in 9 theaters in Mountain View." Okay. So Siri definitely understood what I wanted. That's great. We're off to a good start. You know, you might quibble that the first result is actually in Saratoga, and the second result is actually in Cupertino, [LAUGHTER] and the third result, I'm not sure where that is but we have to go all the way down to like number 5 or 6 before we get to one that I'm pretty sure is in Mountain View. So, you know, we might quibble about that. But I- all in all, I think it's a pretty promising start that we're off to. Now, I'm gonna try to continue the dialogue from the previous slide.
So I say, "When is it playing there?" You might think my question is kind of stupid because Siri already gave me the show times, and I guess I failed to read them or something. So okay, I'm a bad user. I'm an idiot. [LAUGHTER] But my claim is that Siri ought to be able to do something useful with this. Even if I'm an idiot, I think Siri ought to be able to make some sense of this. A human would be able to make sense of this, right? If you had a human, the human be like "Okay he's a dumb user, but I'm gonna give him the answer anyway." When is it playing there? I think you ought to be- you ought to be able to interpret that as referring to maybe the first result that you gave me.
And so maybe give me the show times for the AMC Saratoga. Let's see if that's what Siri has done. And Siri says, "Here are some movies playing at theaters near you." I don't have the feeling that Siri really understood what I was asking for. It didn't give me show times for the first result, didn't give me show- well, I guess it did give me some so- shom- some show times. But I don't feel like Siri really understood me at all.
I feel like this conversation is kind of going off the rails. Let's forge ahead though and see what happens [LAUGHTER] with the hardest part. "Okay. I'd like one adult and two children for the first show. How much would that cost?" Now, again, I'm just following the script from the previous slide, and maybe I'm an idiot, but I think Siri ought to be able to interpret this as referring to- I mean, I said the first show. So I guess maybe that's this one or maybe it's this one or something. I wanna buy some tickets, and I want Siri to understand that I wanna buy some tickets.
Siri says, "Here's what I found on the Web for 'Okay, I'd like one adult and two children for the first show. How much would that cost.' Have a look:" And it gives me some results about how much I should spend on groceries, and when should a child be taken from his parents. [LAUGHTER] Seems a little odd. Um, it's interesting though Siri's fallback strategy when it doesn't understand. I mean, it- it has to do something, right? It can't just, like, raise an exception and print a stack trace.
[LAUGHTER] It- it has to do something. So, its fallback strategy is, it takes the last thing it thinks you said, and it searches the web for it. And, you know, as a fallback strategy, that doesn't seem like the worst idea. Um, I'm really not trying to make fun of Siri, my point is to illustrate just how hard NLU is. Uh, by the way, I tried the very same interaction with the Google Assistant, and it didn't do any better. It was pretty much the same kind of outcome. Uh, besides, I don't need to make fun of Siri because people much funnier than me have already done the job. For example, Stephen Colbert. So this is from when Siri first came out in 2011. I'm not gonna try to play the clip but I put the URL, uh, and I'm gonna post these slides so that you can go check it out later, if you want to.
Um, Colbert explains that he's been playing with his new iPhone 4 all day, and so he didn't get around to writing the show. So he wants Siri to write the show for him because, like, Siri is magical, right? And it'll just work. So he says to Siri, "Write the show." And Siri says, "What would you like to search for?" And he says, "I don't wanna search for anything. I wanna write the show." And Siri says, "Searching the Web for 'search for anything.
I want to write the shuffle." [LAUGHTER] So you see this fallback strategy again of searching the web. And again, you get the impression that Siri didn't really understand what he was asking for. Now, in this case, maybe it's forgivable because, you know, his meaning was complex. He wasn't just asking to set an alarm. Um, but it's kind of interesting to see the same- the same fallback strategy. Uh, and then I love this. A few minutes later, he says, um, "For the love of God, give me something- no. For the love of God, the cameras are on, give me something." And Siri says, "What kind of place are you looking for? Camera stores or churches?" [LAUGHTER] So this is very revealing. I- I think what's happening here is that Siri has like latched on to cameras and God, and then it's like an eager puppy that just wants to go search the Web for you. [LAUGHTER] And then a little bit later, he says, "FU." And Siri says, "I'd blush if I could." [LAUGHTER] So this, this fallback strategy is very reminiscent of Eliza.
Do you guys know about Eliza? So Eliza is this very famous program from the 1960s. It was one of the first programs that, at least, appeared to be doing natural language understanding. Eliza is basically a chat bot that assumes the role of a psychiatrist, uh, and at first the dialogue seems to be surprisingly natural and life-like, um, and in fact, some of the early users of- apparently found it so convincing that they asked to be alone in the room with a therapist. But if you're a slightly more discriminating observer, you soon realize that it's just reacting to triggers in what you say.
It's just matching patterns and then transforming the patterns in some way. So if I say, x, then Eliza says, why do you think x? Or does it please you to believe that x? Now, it does demonstrate, uh, at least a little bit of linguistic capability. So for example, it knows how to swap the first and second person when reformulating x. But this is only the shallowest imitation of understanding. It's not the real deal. It's what Hector Levesque would call a cheap trick. Now as you probably know, NLU has become a major priority at Google. It was my focus when I was at Google. And one reason for that is that, more and more search activity is shifting to mobile devices, and more and more queries are spoken rather than typed. And it turns out that when people talk to their phones, they're much more likely to use natural language than when they're typing into a search box.
So Google has made a big push on what's known as conversational search, and a big part of making that work is understanding the context of the conversation. So instead of a bunch of isolated queries, you can have a coherent conversation and you can refer back to stuff that was mentioned earlier. So you can ask about Chicago and then you can say, who's the mayor? And Google knows that you're talking about the mayor of Chicago, and then you can say, how old is he? Who is he married to? And Google knows what you're talking about or you can ask about the San Diego Zoo, and you can say, is it open? How far is it? Call them.
Uh, of course, it doesn't always work perfectly. Getting this stuff right is remarkably hard. But more and more it actually works. Uh, actually semantic interpretation is not just for natural language queries. It's also increasingly important for queries that aren't natural language but are more like what you might call Google-ese, whether spoken or typed into a search box. That's because a growing proportion of the query stream is not well served by traditional keyword-based information retrieval. More and more queries are seeking answers that are not found on any web-page. So consider something like, how to bike to my office. There's not a web-page that I- there's not a, like, static ordinary web-page that I can give back, that's gonna give you the answer to that.
Um, and more and more queries are seeking action rather than information. So something like text my wife, I'm on my way. Just imagine if Google responded to queries like these, by giving you 10 blue links to documents containing those terms. I mean, that might work okay for this one, I don't know, I haven't tried it. It might work okay for this one. Uh, it's definitely not gonna give you a satisfactory result for this one, right? That's gonna be a really bad user experience. Satisfying queries like these requires semantic parsing, which means mapping the query into a structured, machine readable, representational meaning that can be passed to some downstream com- some back-end component to take action on. Uh, and I've shown some examples of what semantic representations might look like here but I'm not gonna dwell on this now. This is a topic that we'll return to in a couple of weeks. Another big application for NLU is sentiment analysis, and growth in this area has been driven by the explosion of user-generated content on the internet. So this basically means looking at product reviews and social media, and trying to understand how people feel about companies and brands, and products, and which specific features of products they like and dislike.
There are zillions of startups that are, that are operating in this space, uh, and they are pursuing a variety of different business models. So, for example, there are so-called market analytics firms that do sentiment analysis on social media and product reviews, and then sell the results to marketers, to help them understand how consumers feel about their products. And apparently if you're an airline, the answer is, they hate you with every fiber of their being. Another intriguing application for sentiment analysis has been in quantitative finance.
Uh, as an example there was a paper that came out a few years ago, that claimed to use analysis of sentiment on Twitter, to predict moves in the Dow Jones Industrial Average up to six days in advance, with 87 percent accuracy. That sounded pretty awesome, pretty exciting. So a couple of guys started a hedge fund to trade this idea. The fund was massively ober- oversubscribed. They quickly raised $100 million to try to exploit this idea. Unfortunately, the methodology was completely bogus, and the hedge fund soon had to shut down. [LAUGHTER]. Uh, but just because this idea didn't work, doesn't mean there's nothing there, and in fact many asset managers are now using natural language understanding techniques in automated trading. Of course, the shops that are using these techniques are, um, extremely cagey about, a, a, ab, about how they're doing what they're doing.
So it's hard to know for sure, how prevalent this is. But if you look closely, you can see lots of signs of automated trading being driven by text analysis. For example, a few years ago a blogger named Dan Mirvish, noticed that every time Anne Hathaway was in the news, their was a jump in the stock price of Berkshire Hathaway. The holding company run by Warren Buffett. Nobody knows- knows for sure, but it seems plausible that some automated trading system was confusing the actress with the holding company.
So some NLU systems demonstrate very limited understanding. Nevertheless, this is a fast-growing area. These days, most trading is automated and most trading strategies rely in part on automated analysis of unstructured feed, unstructured data feeds. So that means, natural language text. That means, things like news stories, and broker reports, and, uh, SEC filings, and transcripts of conference calls, and social media, and so on. It turns out that you can make enormous trading profits if you can discover and act on market-making news, just a little bit faster and more accurately than your rivals. So, um, you know, things like Disney's acquisition of 21st Century Fox or, uh, Trump's summit with Kim falling apart. Essentially, they're using natural language understanding to predict the markets. [NOISE] Actually it's when things go wrong that the use of NLU in automated trading systems becomes most apparent.
And their was a very interesting example in 2008, involving United Airlines' stock. So what happened is Google News crawled the Florida Sun-Sentinel's website, and they found a story about United Airlines filing for bankruptcy. So they began serving it up on Google News. Automated trading systems began to react within seconds, and this triggered an avalanche of stock sales. Within 12 minutes more than a billion dollars in stock market value had evaporated, uh, and that's this huge cliff here in this, uh, price chart. The problem was, the story was six years old. It was from 2002. Um, for some reason, the newspaper had posted this old story in their popular news section, and the only date on the article was September 7th, 2008. So it was the day before this thing happened.
So finger-pointing all around, the newspaper should have dated their article properly. Google should have recognized that the story was a duplicate of a story from six years old, you know, from six years ago and above all, automated trading systems shouldn't believe everything they read. There was an even more dramatic example in 2013. Uh, you might remember this one. Somebody hacked the Associated Press Twitter feed, to report explosions at the White House and Obama injured. So instant pandemonium. The Dow immediately plunged more than 140 points and then recovered within just six minutes. [LAUGHTER]. Because and look, look at the size of this spike right here. This is incredible. Because this affected the entire market and not just a single stock, the impact of it was much bigger. The S&P 500 temporarily lost $136 billion in market capitalization. Now you might say, oh, but it came right back up. So no big deal, right? But the people who bought on the way up were not necessarily the same as the people who sold on the way down.
[LAUGHTER]. Chances are, there were some huge winners and some huge losers from this event. And this was again, very likely driven by automated trading systems. Uh, here's a quote from some dude who says, "That just goes to show you how algorithms read headlines and create these automatic orders. You don't even have time to react as a human being." So there are tremendous economic incentives to improve the accuracy and the reliability of these NLU systems, and banks and hedge funds are making big investments in this area. Okay. So that was, kind of, a whirlwind tour of a bunch of real-world commercial applications of NLU, uh, with particular emphasis on limitations of NLU. There's a lot of stuff that's still not really possible or still goes wrong with NLU, and for us that represents a big opportunity to make new discoveries and find better ways of doing things.
[NOISE] Enough about applications. How are we going to approach this topic? Uh, NLU is a big field, it covers many different subtopics. So how can we organize the content of the field? One way to organize the subtopics is into three levels of meaning. So at the bottom level we have words, and the meanings of words, and that's the province of lexical semantics. Then at the next level, we can look at how the meanings of words combine to form the meanings of phrases, and clauses, and sentences.
And that's the province of compositional semantics. And then at the highest level, we can look at the meaning of language in context, and the meaning of dialogues, and discourses, and this brings in topics like reference resolution, and pragmatics, and so on. A different way to cover up the field of NLU is to try to categorize tasks based on the, kinds of, output representation that we're trying to produce. In most NLU tasks, we take a sample of text as input, and we're trying to map it into some, kind of representation of meaning, either as an output, or as an intermediate step toward a practical goal. Those semantic representations can be continuous or they can be discrete, and they can be simple or they can be complex. So for example, in sentiment analysis, we're most commonly trying to generate a scalar value indicating positive or negative sentiment.
It might be just -1 or +1, or it might be a five-star rating system, or something like that. With vector space models and meaning, that we'll start looking at on Wednesday, uh, we represent the meaning of a word or a phrase as a point in some high dimensional space. Or a different perspective, is that it's some, kind of, distribution for example, a probability distribution over abstract topics. So those are continuous representations of meaning. We can also have discrete representations of meaning. So for example in relation extraction, we're trying to produce, uh, relation instances or database triples, things like Larry Page, Founder, Google, or Google located in Mountain View.
And in semantic parsing we're trying to produce semantic representations that can have arbitrarily complex logical forms. So here's a logical form that's intended to denote the largest state. Uh, and we'll look at all four kinds of problems in this class. [NOISE] Two themes that will appear again and again in this class. One theme is semantic composition. And the question here is, how are the meanings of bigger pieces, like phrases and sentences, built up from the meanings of smaller pieces like words? Within li- within linguistics, this question has been a focal point of compositional semantics for decades and it will be a big theme when we look at semantic parsing in a few weeks. But in recent years as vector space models of meaning have grown in popularity, there's also been a lot of interesting work on how to combine vector representations of meaning, and we'll look at some of that work. Uh, the other big theme is learning. And here the question is, how can we build models of semantic, uh, for semantic interpretation automatically from data? And there are two reasons for wanting to do this.
One is a matter of principle, and the other of pragmatics. The principled reason is that we want to be empiricists, we want to be data-driven, and we want to build models that can interpret the language that people actually use, not the language that we imagine that they use. The pragmatic reason is that building large complex models by hand is slow, and tedious, and expensive, and it doesn't scale. So we want to automate that work so that we can build these models quickly, and cheaply, and by the way, also build them in 40 different languages.
[NOISE] Now we have some goals for this course that go beyond just covering the material. One is that we want to give you the skills and tools, uh, that you need to be- to quickly apply, uh, that you can quickly apply to NLU problems beyond this class whether in academic research, or in internships, or jobs in industry. I mentioned earlier that there's- there's a voracious demand for smart people with knowledge of NLU and we want to enable you to jump in and be effective, uh, wherever you go. [NOISE] Another goal is to support you in completing a project that is worthy of presentation at a top NLP conference. The biggest deliverable of the class will be an independent research project and a final paper in the style of NLP conference papers.
In previous years, many CS224U papers have actually been turned into conference papers. And whether you are an undergrad, or a Master student, or a PhD student, presenting a paper at a top research conference is one of the best ways to enhance your credentials and get to the next level. So I'm gonna stop here, and I'm going to turn things over to Chris to talk about course logistics. All right. Great. [NOISE] So I'm just going to take the remaining 20 or so minutes that we've got to talk about some course logistics. Uh, I want to make sure that you leave here with a good sense for how the course will work, what kinds of work you'll be doing, and so forth. Um, and I thought a good way to start with this is to just show you how the regular assignments are going to unfold.
As Bill said, essentially everything about this course is geared toward giving you hands-on experience with the, kinds of, models that we're exploring. What we would like for you to do is take like baseline code that we've implemented, and take it much further. And we think that a key aspect to doing that is that hands-on experience, not just listening to us talk about these models and showing off the best results in the field but rather down nitty-gritty, actually exploring the code, trying different simulations out and seeing what happens.
And so a lot of what we do is oriented toward giving you that kind of hands-on experience. Um, and the one way you can see that is the way we've structured these assignments. So there are four assignments, um, and they unfold over the first half of the course. The first one you have two weeks, because the start of the quarter is always, kind of, chaotic but after that they go on a one-week cadence Monday to Monday. Um, the first one is due on April 15th as I said, and then weekly after that. Um, each one is hands-on experience. What- what you'll do is work inside a Jupyter Notebook, upload it to Canvas, and we'll evaluate it. But the primary thing beyond the evaluation is that we want you to explore different kinds of models and the assignment questions are oriented essentially toward having you set up some baselines. Um, and then a crucial part of the assignment here is that each one culminates in what we call a Bake-off, and this is a term from the field.
Um, the idea is that you enter an original system and it's evaluated on some test data that we're gonna provide during the bake-off, and you enter your score. And we're gonna give a little bit of extra credit to the teams that have the top system. Um, and the way the assignments work is, you set up those baselines then you implement your original system, and you get nine points for doing that, and then you get an additional point if you just enter the bake-off. So you'd- all you have to do is enter your system, and that happens Monday to Wednesday after the assignment is due.
And the idea, you know, best practices in the field is you do all your development until you submit the assignment, and then you don't do anymore tuning or anything like that, you just enter your system into the bake-off. And then winning systems get a little bit of extra credit. Um, what we're trying to do here with these assignments and the bake-offs is kind of give you a sense for how we think projects should unfold because what we're going to push for you in the second half of the course is that your project should kind of implement some baseline systems, and then offer an original system that tests a hypothesis about how you think the world works, a scientific hypothesis.
And we're trying to push you in that direction and get you to build those baselines and then doing- do something original. And the bake-off aspect of this is meant to be just, kind of, fun. And what we'll do is reflect back to you which systems one- which system does really poorly, and try to offer insights as to why that happened. Um, and that's always been really interesting because this is like crowd-sourcing lots of different ways that you might approach these problems, some good, some bad and I think we can all learn from the results. What we're trying to do as I said, is kind of exemplify best practices for NLU, the code that we offer is meant to do that, the assignments are meant to push you in that direction.
You should let us know if you think we're not living up to this. Um, we would love to have an exchange with you about how to do better when it comes to best practices for solving these problems. So that's kind of the first half of the course, and you'll see that when we look at the web page, you'll see that what we're doing as part of that, is introducing lots of topics and they are the topics that we think of as the kind of essence of NLU, the ones that if you really mastered them then you can do pretty much anything in the field. Uh, and that's just, we have limited time but we're hoping that that's, kind of, a generator for you for lots of different approaches in problems. This is a practical note here. For the assignments, and for the projects, and the bake-offs, you can work in teams. The way you'll do this is, you should form a team on Canvas, and I've given instructions here.
You go to the People tab, and then the Groups tab, and then you have to pick a pre-selected team from the large number of teams that I created for each one of these assignments. That's a little bit tedious but once you've formed your team that means one of you can submit the work, and it will be- you'll all get credit for it. And you should make sure that if you formed a group, then your assignment is associated with your group. I think the interface is pretty clear there. So the only tedious part is actually forming a group because you have to pick from a pre-selected one, that was the best I could figure out. But in terms of actually your work being uploaded to Canvas, it's just that you'll upload a Jupyter Notebook and then the team will take it and evaluate it. Did you have a question? So are there different teams for each different bake-off? There can be.
Yeah, that's the way I set it up. You'll see that they're called like Assign 1 Bake-off 1, Assign 2 Bake-off 2, that means you can switch. There's also one for- one for the final projects. If you want to have the same team through all of them, you'll just have to reconstitute your team for each one. Um, it's not the best system, but I think if we get used to it it'll work okay. Questions about the assignments and the bake-offs before I move on? Yeah. Is grading based off the number of people that are assigned on a team? Not for the assignments and bake-offs.
I'll show you when we look at the website in a little bit. We have slightly different criteria for large teams working on the projects, but it's a very soft touch thing, yeah. But for these, I don't know. We're hoping that you have a fun experience of collaborating and you yourself can try lots more models if you work in a team, and see what works best. Then the second half of the course, you'll see this when we look at the schedule. Like the first half is topics and then the second half is less about topics, and more about us trying to help you successfully execute on a high-quality NLU project. And so what we do is shift toward lectures that are much more about metrics, and methods, and tick tricks, and tips, and other best practices.
Things that will help push your project ah, in better directions. Um, so it becomes less about introducing new content and more about that kind of methodological stuff. And we're hoping that in parallel with that, of course, you're working on your project as a team. And the way we're going to encourage that to happen is that it kind of- the project unfolds over a series of assignments. So the first one is a lit review. This is just kind of you and your team, if you form one. Getting oriented around what problem you want to solve. Maybe less familiar, what- what we've done is set up an experimental protocol as the second um, ah, assignment. And I'll show you what it- what its contents are when we go to the web page. But the idea here is just that we're trying to push you to be sure that your project is testing some scientific hypotheses.
Then you do a video presentation. Um, we haven't typically done poster sessions because it seems like an awful lot of work to create a poster. When this class was much smaller, back when NLU was much less popular, we would have people give three minute talks and that was always lots of fun because we saw all these wonderful little presentations. The class has gotten too big for that. So as a compromise, what we've done in recent years is have teams do little short videos and then you could put them on YouTube or upload them to Canvas, and it can be public or private, whatever you want to do.
Um, but the point is that gives us a glimpse of what your project is like and gives you a first chance to think about how you want to present it. And then of course the final thing is the final paper. Uh, and if you're interested to get started here, this is a link, we'll post the slides in a bit, a link to some past projects that are really exceptional from a bunch of different years and from recent years, I have some video links available. Um, so you can also see what good video presentations were like. Questions about that? I'm going to circle back to this in a second. But yeah, go ahead. [NOISE] Can we make our final project in open source like public on GitHub? That would be wonderful. Yeah, of course. Yeah, I don't want to make it a formal requirement but one of the best things about NLP in general and this is sort of true of all of AI at this point, is how much the community has formed norms around making sure that you open source your code and data.
It's almost like you don't have results. If you don't have results that other people can pretty easily reproduce and this is wonderful because, when I started out in the field, you would spend months or years trying to build up baselines that other people had established, and now the pace is picking up just because it's so easy to share these results, and that kind of comes back to people's willingness to distribute data, right away, and also open source their code. So not a requirement but it would be great to be getting like GitHub links for all these projects. Other questions? Course logistics. So I'm going to go follow these links in a second here, but so I'm going to go to the website. The teaching team is me and Bill, and we have 10 TAs. I was wondering, this is going to be hard because of the microphones but could the TAs who are here stand up.
I would love it if you introduce yourself and just said kind of what your interests are in NLU, and also if you're an alum of the course, it would be cool to mention that. We're going to start. I am a master's student in the CS department. I am an alum of the course and my interest in NLU is nothing in particular, I don't have a good answer. Not everything. [LAUGHTER] I am also a masters student. I took this last spring. My interests in NLU are generally building systems to understand like language of the internet in English and also other languages, and everything else attached to it. A masters student.
I did course last year and my interest is in modelling dialogue and conversational systems. I'm also a master student and my main interest is natural language imprints and I did take this course last year. I am a PhD student. My interest are in deep reinforcement learning [inaudible]. Hi, I am a masters student. I am an alum of this course and I am interested in learning about [inaudible] social data sets. [NOISE] [inaudible] I am actually manager at the implication of genomics but I'm interested in [inaudible] , and also [inaudible] Let's see here.
I'm not sure who's missing. We'll try to introduce lots of people next class and after that. I do want to mention that [inaudible] who can't be here in person. He's one of the TAs, he's going to be more or less a virtual TA. So he graduated from Stanford a number of years ago. He did his master's thesis and his honors work in Natural Language Understanding, kind of applied to social problems, and he's returning. This is a program that was created by SCPD. So it should be fun to interact with him because he's been in industry doing NLU type things now, for a number of years, and I think you'll be able to reach him on video.
Uh, and I wanted to say also, thinking about these alums and I remember some of their projects, one of the wonderful things about this course is that we get a really wide range of projects. They kind of go from, yes, like hard-core deep learning things, all the way through to digital humanities and kind of social science problems, and I think that's a wonderful aspect of this course for me because like for me, as a researcher. One of the exciting things I think I can do is identify results that come out of NLP and show people in neighboring fields, especially in my own field of linguistics, that they have real value, and it's just great to see people doing projects that kind of exemplify that in sociology and in history, and in English, and in psychology, and linguistics of course, and so, we're very encouraging about getting really creative, about the kind of problem that you take on for your final project.
The lectures for the course will be streamed and stored on Canvas. You should all have access to Canvas. If you want to reach the staff, this is the address to do that. But you can also just post on Piazza. We- I think the whole teaching team will be quite active on Piazza. So you can use that to get in touch with us and then here's a link, which I'll show you in a second just to the kind of things that make up the core components for the grade. Before I switch out of these slides, special announcement.
We have a special session on Friday. It's going to be run by , that is about- it's kind of a refresher or to help you get up to speed on Python, and on working in Jupyter Notebooks. We're kind of assuming that you're a pretty good Python programmer already. The Jupyter Notebook thing might be more unfamiliar to you but it's going to be a kind of norm in the course, that we do a lot of our coding and exploration, and assignments, and bake-offs and stuff in notebooks. So if that's new to you, I highly encourage you to go to this Friday session and here are the details. We will give you a bunch of other reminders for this. It's great that are doing this. And then for next time, you should get your computing environment set up. I'll show you in a second what I mean by that. But here's a kind of run down. We have a course GitHub repository that has lots of code.
It has the notebooks; it has the assignments. Um, you should get in the habit of kind of syncing with that repository, making sure you have the latest updates and so forth. Um, a lot of the tools are there, and this is a kind of run down. We're, we're gonna encourage you to use Anaconda for your Python environment. Officially, the course is using Python 3.7, I have tested the code all the way down to 3.5, and it actually basically works under Python 2.0.
But please, do not use Python 2.0, um [LAUGHTER] it's now moribund. Um, but if you're stuck on 3.5. for some reason and you don't want to create a virtual environment, you're probably going to be okay. Uh, and then there's a bunch of other stuff that you want to install. A lot of this is taken care of if you're using Anaconda, but if not, you're going to have to kind of do all the requirements. Uh, and then there's a huge data distribution folder. Um, I just updated the link to one where I think the archive is going to work. I was really proud of myself for getting all of the data for the whole quarter together into one archive, and of course, it was like too big to be a ZIP archive. [LAUGHTER] Um, I'll post some corrected links, and with luck we'll get this sorted out.
Uh, but you only have to get it downloaded once. Uh, and then what you should do for the next time, and kinda over the next couple of weeks, is watch the screencast that we have available for this unit. Uh, to the core reading is the one that's by Turney and Pantel 2010, and then you should just start exploring the notebooks. Uh, and you could do that kind of on your own speed. Uh, I'm gonna be here with you giving lectures about it, and the lectures are also going to be trying to get you to work on your laptop more or less as I talk to explore things and figure out what's going on. Um, let me just show you a bit about the website in the remaining time that we have. Okay. So this is the main page. It has pretty much everything you want. So, there's a link to the Piazza site, uh, which you should join if you're not already enrolled there, and the Canvas site and our GitHub, and there's the, uh, staff address.
And then down the left column here, you have the teaching team, and we've listed all of our office hours. So that's kinda organizational stuff, and then here's where all the action is going to be. So this is the main schedule. Uh, it goes kind of topic by topic. So, as Bill said, we're gonna do vector space models, that is, vector representations of meaning for the first two weeks. It's kind of a slow start as people get oriented, and we've, you know, kind of get people who are newer to the field up to speed. And that covers homework one and bake-off one, and then we do some supervised sentiment analysis using the Stanford Sentiment Treebank.
Then, relation extraction. Then, natural language inference, grounded language understanding and semantic parsing, and I believe that's the kinda, kinda the point where we shift. The assignments become more project oriented, and the topics are more around, like, metrics and methods, um, contextual word representations as a tool to make your models even better and some other stuff like writing up and presenting your work. Uh, and now I'm, I'm, I'm not sure of the scheduling yet, but I'm hoping that on May 22 we have a kinda panel discussion with people who had been doing NLU in industry for awhile to kind of give you guys a sense for what it's like to take this research and apply it out there in the world. Um, and then and more kind of projects stuff. Um, the last, the last part of the week of the course because of NAACL when a lot of people have to be away is kind of just given over to you for doing project work, and then the final project is due on June 9.
Another thing about the schedule, so there are a lot of links already in place. Those are the notebooks for the assignments, and the kind of core notebooks that will allow you to do the hands-on work that we're envisioning, that will help you do the assignments, and get really accustomed to what these tools are like. So, those are already there, and you can start to work with them, and you might want to skip around especially if you're, um, familiar with NLU problems, like, you took the deep learning course or something like that, you might want to skip around because, for example, you could do a really good job on the bake-off if you can figure out those contextual word representations, that kind of thing. So browse around, see what's there, um, we'll be posting slide shows pretty regularly that kind of try to integrate all this material together.
Along the resources column, we have links to recommended readings, and in some cases those screencasts which are short YouTube videos that I made that are kind of encapsulations of the really technical parts of the material that we'll be covering, the stuff that's kinda hard, uh, to just get from a lecture that you hear once. Um, I've tried to distill it down to just the essence. We have especially good coverage for those, for these early units, which is good because in this first unit for vector space models, we're really trying to build the foundation for thinking about how to build up more complex NLU systems. I think that's it for now. Any questions? Oh yeah, I could show you just one more thing. So, the other two pages are, we have a detailed description of how the projects unfold, other requirements are here for, like, what we expect from a lit review, um, what we expect from that experimental protocol.
Um, some guidance about video presentations and the final paper, and here are those links to, um, past projects that are really exceptional, in case you want some examples, the range of projects people have explored and also just what really good projects look like. Um, and then finally, still in this last minute, um, there's also a kind of policies and requirements page that reviews, like, how much homeworks are due- worth, the lit review, the protocol, the presentations and the final project. Uh, and there's some details here about kind of policies and other stuff like that. I guess, I'll leave it to you to read that on your own.
It's not so thrilling for me to review it now. [LAUGHTER] Um, before we wrap up, are there any questions or comments, or concerns that I can field? Yes. Is the winner of the bake-off, is it just determined by the multi-criteria of C or other criteria? For each one, we've specified the criteria. There's a specific score typically, and what I've said is that the systems that have the top score will get the extra credit. Um, we're going to have to see what the range of scores is like, but it's very tightly regimented the way these bake-offs tend to be. Yeah. Will the projects get sponsored with compute time on AWS or Azure, or GitHub? I'm working on it. Yeah, I've been trying to get Azure. Uh, uh, I have to be a little persistent I guess, but I am hoping that I can make resources available to you guys..