## In Defense of Incomplete Inference

**Scott Fahlman, July 17, 2008
Categories: KR Issues
**

In this article, I will expand a bit on some comments I made in the previous article about the tension between expressiveness, scalability, and the “general theorem proving” approach to inference – an approach that currently dominates the field of knowledge representation (KR). For a practical and useful knowledge base system (KBS), I think we have to find another path, and I will explain why I have come to this conclusion.

It is not my goal here to present a tutorial on first-order logic (FOL), resolution theorem proving, or the decidability and tractability of inference in FOL and in all the various subsets and supersets of FOL that have been proposed over the years. There is not enough space for this in a short article, and I certainly am not the best person to explain these issues. There are many good tutorials on these topics, both online and in the published literature, and most of those tutorials offer pointers to original sources for those who want to dig deeper. One good entry point, which ties these topics explicitly to the concerns of practical knowledge representation, is the book *Knowledge Representation and Reasoning*, by Ron Brachman and Hector Levesque, mentioned in an earlier article in this blog. So in this article I will assume some prior knowledge of FOL and related matters; readers totally lacking such a background will probably find this article to be impenetrable and uninteresting.

**First-Order Logic for KR**

When faced with the problem of representing symbolic knowledge in a computer, most researchers immediately turn to First-Order Logic (FOL). It seems the obvious choice. This approach rests on a solid mathematical foundation, built up by great minds since the time of Aristotle. There is a very solid theory telling us what FOL can represent and what kinds of inference are possible in this logic. John McCarthy, with the publication of his 1959 paper *Programs with Common Sense*, and in much of his subsequent work, gave the just-created field of AI a mighty push in this direction, and away from the rather messy and obscure embedding of knowledge in programs that most of the early pioneers of the AI field had engaged in.^{[1]}

For the inference procedure, it seems natural to choose something like resolution theorem-proving^{[2]}, with its guarantees of sound inference and logical completeness. The basic idea is simple: if you want to determine whether proposition X is true (provable) relative to a given knowledge base (KB), assert not-X and see if the KB is inconsistent given that addition. If so, X must be true. Note, however, that this inserts a potentially very expensive step into the procedure for even simple queries: “determine whether the entire KB is now consistent”. In many cases where X is true, a well-designed inference algorithm will terminate quickly, having found some inconsistency that derives from not-X. However, proving that *no* inconsistency exists, even by the most circuitous path through the assertions and predicates of the KB, can take a long time. In more expressive logical systems, the search may not terminate at all.

There is the added problem that if some inconsistency was *already* present in the KB before not-X was added, the system will agree that any statement X must be true, even if it is something absurd like “one equals two”. So when using resolution, or any other logically complete inference method, knowledge-base hygiene is all-important: inconsistency must be kept out at all costs. This can be a problem when combining large bodies of information from multiple sources with varying degrees of reliability. But if we have a sufficiently fast procedure for testing the KB for global consistency (as we must have when using resolution), the problem of KB hygiene can be solved, though perhaps at great computational cost.

**Speed and Scalability vs. Expressiveness**

Here’s the big problem: it has been proven that logically complete inference in a full first-order logic system is computationally intractable. By *intractable*, we mean that the time required for the worst cases grows exponentially with the size of the KB. Over the years, some very clever inference engines have been developed that speed up typical-case queries by caching partial results or by ruling out parts of the KB that are (provably) not relevant to a given query, but the fundamental problem of intractability remains. The practical consequence is that a system based on full FOL and logically complete inference cannot grow to the very large size that we need for most interesting real-world KB applications.

When faced with this problem, most recent researchers in the KR field have turned to more restricted, less expressive logical systems than full FOL. There are many possible tradeoffs between tractability/scalability and expressiveness, and KR conferences are full of papers analyzing and advocating various points in the space of possible tradeoffs. A rather complicated alphabet soup of abbreviations has grown up in the theoretical KR community for describing the various expressive mechanisms that are included or not included in a given KR system.^{[3]}

The OWL notation, which currently seems to be the most popular KR system in actual use (especially for “semantic web” applications, since it has been blessed by the Worldwide Web Consortium), is actually a family of languages. OWL-Lite is a very restricted form of OWL that is tractable and for which reasonably fast inference engines exist. Some popular inference engines are Racer, FaCT, and KAON2.

However, in OWL-Lite it is impossible to express much of the knowledge that we need to express for real applications. OWL-DL, based on Description Logics, is somewhat more expressive: OWL-DL is itself a family with multiple versions, corresponding to the various ways in which description logics can be restricted. Some versions of OWL-DL are theoretically tractable and some are not; of the theoretically tractable versions of OWL, some fit into existing “fast” inference engines, and some do not. OWL-Full is more expressive still, but does not generally work with the inference engines that have been developed for description logics.

This research on the agonizing tradeoffs between scalability and expressiveness is valuable, but for our purposes in the Scone project – and for any other KB system with similar practical goals – these tradeoffs are beside the point. **For human-like reasoning on real-world problems with a large KB, we need both scalability to very large size and expressiveness that is greater – not less – than is provided by first-order logic.**

Regarding scalability, the core inferences that occur frequently in the KBS must be not only tractable, but fast. Even if a logical system can produce results in polynomial time, that is not good enough if the polynomial involves high-degree terms or large coefficients. If the KB is to scale to the size required for human-like common sense – or beyond – we need more-or-less constant-time responses for most kinds of queries, and linear-time response (or near-linear time) for most of the rest.

Of course, there is no magic approach for solving provably intractable inference problems. Some difficult reasoning problems will always require exponential or high-degree polynomial time, no matter what we do. But I would argue that these are not the inference problems that we need to solve frequently in the course of everyday human-like reasoning. So instead of holding the KBS design hostage to the worst-case performance of these most-difficult problems, we could instead optimize the KBS for the easier problems – the ones that we humans routinely solve with no apparent display of mental effort – and banish the more difficult problems to a specialized theorem-proving or puzzle-solving module, where they won’t bog down our reasoning when we want to determine, for example, whether Clyde the elephant has a heart.

Regarding expressiveness: as mentioned in the previous article, and as we will explore in more detail in future articles, a real-world human-like KBS needs higher-order constructs (i.e. the ability to refer to assertions as objects in the KB and to reason about the assertions themselves), multiple overlapping contexts (which really are just a convenient re-packaging of the higher-order constructs), and default reasoning with exceptions (non-monotonic constructs). If we add any of these features to our knowledge representation system, it takes us beyond the intractability of FOL and into the space of undecidability – that is, if we try to apply a logically complete inference method like resolution, it may not return an answer at all for some queries, no matter how long we wait.

**Three Major Goals: Pick Any Two**

So that’s the problem. We want three things from a practical KR system: (1) logical completeness and provable consistency; (2) speed and scalability; and (3) enough expressive power to cover all of our needs for representing common-sense knowledge. There is a well-established body of theory that says we can’t have all three properties at once. So researchers must choose which of these goals to give up (or at least to compromise on).

For many researchers, giving up the logical properties of completeness and provable consistency seems to be inconceivable – they don’t even discuss this as a choice – so they end up arguing endlessly about how much expressiveness to trade off for somewhat better speed and scalability. Often, they end up with the worst of both worlds: an awkward, inexpressive system that is still too slow to be used in many applications.

Some of the logicians in the KR field have effectively given up on scalability: they remain rooted in the logical tradition and continue to explore ever more complex and expressive logics, even if the resulting representations are intractable or undecidable.

And a few of us, unwilling to compromise either on expressiveness or scalability, have made the choice to give up on logically complete proof procedures and to reason in a more local and limited way. Our systems are not general theorem-provers, but they seem to support human-like common-sense reasoning quite well. In any case, given our goals, this is the only reasonable choice.^{[4]}

**Incomplete Inference**

So what are the implications of this choice? On the plus side, giving up on logically complete inference is very liberating: the greater expressiveness available in an incomplete-inference system like Scone allows us to easily express and reason about general-knowledge statements and concepts that would be extremely difficult, if not impossible, in a system based on FOL or a tractable subset of FOL. In Scone it is easy to represent and reason about pink elephants and flightless birds, and to represent complex multi-context statements like “Tom believes that Fred knows Mary’s phone number, but Fred doesn’t really know it.” This expressive power makes it much easier to build up large bodies of knowledge in real-world domains, since we are not constantly fighting against the fundamental limitations of our descriptive machinery. Since our KBS is more expressive, we can import knowledge from representations such as OWL into Scone, though we cannot, in general, export Scone knowledge back into less expressive systems – some of it just can’t be represented there. Freed from the expressiveness/tractability dilemma, we can focus on making our system’s limited inference as fast and scalable as possible.

On the other hand, there will be some valid inferences that could be made from the knowledge in our KB that a system like Scone will never make – that’s the definition of incomplete inference. Scone’s reasoning is mostly local – it will not follow chains of inference to arbitrary depth. When faced with some loop (“Who shaves the barber?”), Scone ideally would flag this as a problematic case, but we cannot guarantee that all such cases will be detected. Sometimes Scone will just return one answer or the other. For human-like reasoning, that strikes me as good enough. I do the same thing. We all do, when we’re not in theorem-proving mode.

Scone can still catch most inconsistencies as they are entered into the KB. For example, if you say that Clyde is male and later that he is someone’s mother, implying that Clyde is female, that will be caught. However, more subtle inconsistencies may sneak into the KB unnoticed. If somehow we end up in a situation where Clyde is both male and female, then we will experience some local confusion regarding Clyde. (What restroom should Clyde use? What sort of sex organs does Clyde have?) But our local, limited inference machinery will then not go on to conclude that one equals two. So the lack of logical completeness is not always a bad thing.

The deductions made by the limited inference engine in our KBS are normally sound, as far as they go, but there are situations where an unsound conclusion can be reached. In a non-monotonic but incomplete reasoning system, this is unavoidable. For example, suppose that elephants are, by default, gray, but that Clyde is pink. Since our KBS supports exceptions, it is legal to cancel the inherited “Clyde is gray” property and to add the contradictory assertion “Clyde is pink”. Suppose, however, that the information about Clyde’s non-standard color is not stated directly, but is derived as the result of a long and circuitous chain of reasoning. In this case, when asked the color of Clyde, our KBS might quickly infer that Clyde is gray and, if we stop the reasoning too soon, might never discover any reason to retract this conclusion. The KBS would return gray as the color, and that is incorrect.

It is troubling to have unsound inferences in our system under any circumstances. However, default inference with exceptions is so valuable that, in my opinion, we cannot live without it in real-world domains. If you know a lot about elephants, it is unlikely that any individual elephant will fit the typical-elephant description without a few exceptions. And if you banish from the elephant description any property that is not universally true, you will be left with a very sparse and not very useful description. So I think that this particular kind of unsoundness is the lesser of evils – the only reasonable choice for a system that must solve real problems in messy, non-mathematical domains.

Again, I believe that this choice is quite consistent with human-like common-sense reasoning. In my daily life, I will often draw an invalid conclusion because I haven’t thought deeply enough about the problem – I have just applied the usual default instead of noticing that the case I’m dealing with is exceptional. We humans all do this in our day-to-day reasoning, whenever we’re not in ultra-careful theorem-proving mode.

Perhaps the greatest disadvantage of an incomplete-inference approach is that it is hard for us to make sweeping, closed-form statements describing exactly what inferences these systems can and cannot make. In most of these systems, the depth of reasoning depends in complicated ways on exactly what is in the KB and on parameters that govern how much time and effort should be spent on a query before the system gives up. So the incomplete-inference systems lack the satisfying mathematical elegance of the theorem-proving systems; in exchange, the offer speed, scalability and expressiveness. I think that each of us must choose which of these properties we value most, depending on our goals.

**In Conclusion…**

None of this is meant to suggest that researchers who cling tightly to the logically complete, theorem-proving approach are wrong to do so. Theorem proving – and, more recently, *automated* theorem proving – are among the crown jewels of human intellectual achievement. Theorem-proving enables us to reason far more deeply than we otherwise could, with a strong guarantee that, if our premises are correct, our conclusions are correct as well. Even for practical inference systems, this approach has some value. There are small but important reasoning problems for which intractability is not a fatal problem. And there are many uses for logic-based representation with limited expressive power. For example, a system like OWL-Lite may be sufficiently expressive for the Semantic Web as it is currently conceived by many: a simple hierarchical scheme to provide somewhat more meaningful labels for URLs and web pages.

So I have no problem with researchers who want to focus on exploring the limits and tradeoffs of logically complete inference systems. It’s important work, and perhaps they deserve the dominant position they currently occupy in the KR universe. But I do wish that they would regard those of us who have chosen to work on incomplete inference systems as equally serious researchers who have chosen a different approach for good reason, and not as ignorant or lazy bumblers who are too slow or too stubborn to perceive the One True Way.

It is rather annoying when the logic people start congratulating themselves for having taken a “principled” approach to representing one kind of knowledge or another, with the sneering implication that other approaches are “unprincipled” – just a collection of *ad hoc* design choices made more or less at random by people who don’t know any better. Principles are important. But if strict adherence to a particular set of principles leads you into a part of the design space that is provably worthless for many important real-world problems, maybe it’s time to look for some new principles.

- At the time, McCarthy’s arguments in favor of a declarative, logic-based approach to KR had only partial success. Most AI researchers accepted, more or less, the “declarative” part, but for a long time the AI field was split between those who favored a logic-based approach and those who just wanted to get the job done. The split remains, though it is now less visible. In recent years, the logicians have gained the ascendancy among theoretical AI researchers and those focused on the “semantic web”. Many people focused on AI applications (i.e. “getting the job done”) still often employ other, mostly older KR approaches that avoid some of the representational limitations discussed in this article. [↩]
- John Alan Robinson, “A Machine-Oriented Logic Based on the Resolution Principle”, Communications of the ACM, 5:23–41, 1965. [↩]
- For some pointers into this space, see the Wikipedia Article on Description Logics. For a good example of the complex ways in which the inclusion of expressive features may be traded off against the speed and tractability of inference, see Ian Horrocks, Ulrike Sattler, and Stephan Tobies: “Practical Reasoning for Very Expressive Description Logics” in Logic Journal of the IGPL, 8(3):239-264, 2000. [↩]
- I am certainly not the first person to reach this conclusion. For example, see Jon Doyle and Ramesh S. Patil’s paper “Two Theses of Knowledge Representation: Language Restrictions, Taxonomic Classifications, and the Utility of Representation Services”, Artificial Intelligence, Vol.48, No. 3 (April 1991), pp. 261-297. They present a longer and more detailed argument leading to the same general conclusion: that in a KR system for large real-world problems, we must use some kind of incomplete inference scheme, though their ideas about exactly what sort of incomplete inference to use are a bit different from mine. [↩]