Scott Fahlman,   February 23, 2011
A friend, Carl Kurlander, runs a Pittsburgh-oriented blog on the Pittsburgh Post-Gazette web site. He asked if I would write up a few quick thoughts on the recent “Man vs. Computer” match on Jeopardy, so I did that. The article was written for an intelligent but non-techy audience, and is a bit more superficial than the usual fare on Knowledge Nuggets. But I thought I’d put here anyway just in case some readers of this blog may be interested.
I watched the recent Jeopardy challenge match – IBM’s Watson system vs. Greg Jennings and Brad Rutter, Jeopardy’s all-time human champions – with great interest for several reasons: first, I’m a long-time fan of Jeopardy, though I’ve never been a contestant; second, I’ve been doing research on artificial intelligence, specializing in knowledge representation and common-sense reasoning, for 40 years; third, several of my colleagues at Carnegie Mellon, both faculty and students, worked with the IBM Watson team, contributing both ideas and software. CMU Professor Eric Nyberg was a key member of the IBM team.
Since I was not a part of this project, I do not know all the details of how Watson is organized internally, but I have been able to pick up a fair amount of information from the team’s public statements. So here are a few of my personal thoughts (not reflecting the official views of Carnegie Mellon or anyone else) on Watson’s victory:
1. The first thing to say is that this victory is a very exciting achievement for IBM and for the science of AI. This four-year effort has been a real tour de force, pulling together a lot of existing (but scattered) ideas – plus a few new ideas – into an integrated system that can hold its own against the best humans on earth on a true Grand Challenge problem. Few of us, even in the AI field, would have predicted this level of success in so short a time.
2. There has been a lot of griping on the Internet about the advantage that Watson had in ringing in. I believe that these critics have a good point. I suspect that for something like 70-80% of the questions, both Watson and at least one of the humans thought they had the answer by the time Alex finished reading the question and the buzzers became “live”. Watson has a huge advantage in these situations, and indeed it won most of these toss-ups, especially on the crucial Double Jeopardy round of the first game. If you take away this buzz-in advantage, for example by letting all three contestants answer if they buzz in as soon as the buzzer becomes live (they would have to be in isolation booths), then I think that the game would have been a virtual tie, or at least much closer than it seemed on TV. But despite this opinion, I think that even tying the two best humans on the planet is a tremendous victory for the Watson team.
A bit more detail on this: The way the button works (as I understand it), a production assistant offstage makes a judgment call about when Alex has finished reading the clue. (There’s some slop in that process). The assistant then pushes a button that arms the buzzers. This also turns on a little light on each podium, and (in this contest) simultaneously sends an electrical “OK to buzz” signal to Watson. If Watson is ready to answer, it immediately triggers a solenoid that pushes its button.
No human is going to beat Watson at that. Humans take about 0.2 to 0.3 seconds to push a button in response to a light. The only way a human is going to beat Watson is if the human listens to Alex’s voice and anticipates when the button is going to go live, rather than waiting until he sees the light. The human contestants seems to have done this a few times during the contest, but it’s a risky strategy: if you jump the gun, you are locked out for something like half a second, so your second try will definitely come in too late. Note that I’m not talking here about relative thinking speed, though that’s an interesting topic in its own right – I’m just talking about button-pushing reflexes.
3. Whether we call it a win or a tie on the task of Jeopardy, the key thing to remember about Watson’s performance is that it makes no sense to argue about whether Watson is more or less intelligent than a human. The IQ-testing industry notwithstanding, intelligence is not something that we can measure with a single number. Intelligence, whether in a human or a machine, is a bundle of many different capabilities. Each of us (and Watson too) have these capabilities in different amounts. Some of us are good at arithmetic, but even the earliest computers were far better at this limited aspect of intelligence than any human; some humans know a tremendous amount of assorted information, but perhaps lack the ability to apply that knowledge in solving real-world problems; some of us can solve complex problems, but can barely get through a day without someone to remind us of simple things; some have talent in written or spoken communication, in music, in interpersonal relations, and so on — all aspects of intelligence that can be measured separately.
Watson is very, very good at retrieving raw facts. In the contest, Watson was not allowed to access the Internet, but no matter – it was pre-loaded with a large fraction of the reference materials available on the Web, including Wikipedia, dictionaries, tables of states and countries, Olympic winners, Oscar winners, characters in old TV shows, and so on. Not even Ken and Brad have this much factual information in their brains. But Watson is much worse than any normally functioning human at understanding the precise meaning of a complex sentence or reasoning about the consequences of what it knows. Humans gain knowledge (in part) by reading about it, digesting what the text says, and putting the knowledge away in a pre-digested abstract form that can be reasoned about.
Watson doesn’t have much of that pre-digested information. When a question arrives, it does a sort of Google search over its stored documents, looking for chunks of text that share many of the same words (and perhaps synonyms or closely associated words) with the text of the question. Then it does a lot of processing to see which candidate answers score the best, based on many different tests – that’s where the thousands of processors, working in parallel, come in — and finally it picks a winner, along with a level of confidence. It does all this in three seconds or less. This filtering technology has come a very long way in a short time – it is the thing that separates a question-answering system from something like Google that returns pages that might contain the answer you want, but that leaves the filtering to the human. But this part of Watson is still very weak compared to human capabilities. For example, Watson is more likely than a human to make category errors, such as giving you the name of a book when the question is clearly asking for a person. And Watson can’t even begin to approach a human’s ability to handle metaphors, wordplay (though it does better than I would have guessed), or simple problem-solving.
So Watson has a different mix of strengths and weaknesses than you will find in any human. It would be very good at some tasks — answering simple questions about a product’s features online — and very bad at others, such as answering a frustrated customer’s questions about how to assemble an Ikea bookcase.
The interesting thing is that Watson’s great strengths and great weaknesses more or less balance out on the Jeopardy task. Fans of the show know that many questions have a factual part and a bit of clue that requires human cleverness — but the clever bit is only really important if you’re not sure about the answer. For example, here is a question from a previous season: ”Logically, it was the radioactive transuranium metal discovered right after neptunium”. Watson might well answer this by digging through its memory and finding the actual dates at which various elements were discovered – that is, it just looks up the answer; a human – even one who likes chemistry – is unlikely to have memorized those dates, but may well get the answer, “plutonium” , by analogy to the discovery of the planets (and by knowing that there is a radioactive element named “plutonium”). So you can win at Jeopardy by having a super-human collection of facts (plus some reasoning), or by having excellent reasoning and a smaller collection of facts. It would be possible to write Jeopardy questions to favor one side or the other, but in the actual contest, the Jeopardy staff thought they were writing questions for a regular show with human contestants.
4. So, is Watson going to conquer the world, or at least take away all our jobs? I think that the world is safe for a while. Watson, after all, doesn’t actually do anything but play one specific game. And while Watson has the ability to access a lot of knowledge and do some reasoning about it, the real-world planning (and coup-plotting) ability of today’s AI systems are a long way from scary-good.
As for jobs, it’s complicated. Watson can answer factual questions very fast, but it would not beat a human player who has access to the internet and enough time to make a few queries. This would give the human access to a store of factual information comparable to Watson’s. Combined with the human’s much greater powers of reasoning and language understanding, I can state with confidence that the human would usually produce better answers – but it would take longer and probably cost more.
In its present state, Watson makes far too many mistakes – “howlers” that no human would ever make — to entrust it with decision-making powers in law or medicine or any field where it’s important to get things right. But as part of a team, a near-future descendant of Watson might be an extremely valuable partner. As the IBM people have pointed out, Watson could take a list of symptoms and suggest possible diseases and treatments, including some too rare or too new to be known to the average human physician. But Watson (with today’s technology) should not be in charge of the treatment. We want a human to have the ultimate control so that Watson’s occasional blunders don’t kill people. In the near future, I think that Watson and its progeny will be used mostly as intelligence amplifiers, working with humans. Watson’s very impressive English-language capability is the key to making these human-machine partnerships attractive. So some jobs — digging through libraries and heaps of documents — will probably be taken over by the machines, but others — the ones involving judgment and complex reasoning — are safe for some time to come.
5. One last comment: You may have noticed that Watson’s computer-generated voice sounded very human, and not much like the tinny “computer voices” that you hear in old science fiction movies. In fact, there are reports that the IBM people wanted a Watson voice that didn’t sound too human, for fear that it would come across as “creepy”, so they deliberately made it a bit flat. A lot of the basic research in generating realistic-sounding computer voices was done by my colleagues at Carnegie Mellon’s Language Technologies Institute.