No Comments

An Article About Articles

Scott Fahlman,   April 29, 2008
Categories:  Natural Language    

In an earlier article, I argued that natural language is really all about meaning. There are many things you can do with natural language that stop short of trying to extract and represent what the words and sentences actually mean. Some of these knowledge-free applications are of great practical value, but they are all dancing around the periphery of real understanding. They ignore the key issue at the center of the enterprise – the very purpose of human language: conveying a chunk of meaning from one mind to another.

In this article, I would like to expand that view just a bit: in addition to what a sentence (or other chunk of language) means, we must pay attention to what it does. That is, we must consider what inferences and procedures are triggered when a listener or reader encounters the utterance – that, too, is an important part of the payload. To illustrate what I mean by this, let’s consider a very simple and familiar example (at least to English speakers): the function of the definite and indefinite articles, “a”, “an”, and “the”.

Before we dive into this, let’s dispose of the “a” vs. “an” distinction. Functionally, these are really the same word. We use one form or the other depending on the initial phoneme (pronunciation, not spelling) of the following word.[1] If a consonant sound comes next, use “a”; if a vowel sound, use “an”. So for the remainder of this article, we’ll just refer to “a” or “an”, and whatever we say of one will be true of the other. These are the so-called indefinite articles, while “the” is the definite article.[2]

OK, what do these words mean? Well… nothing, really: they’re what we call “function words”. So let’s look at the function.

Suppose you say to me, “While I was walking home today, the dog attacked me.” The use of the definite article here is a signal that you expect me to know (or be able to figure out) what dog you’re talking about. My mental machinery should go to work and try to figure out which dog this is. If I can’t figure it out, I should ask you for a clarification: “Wait… what dog?”

I might defer this query for a while to see whether you’re about to provide me with enough additional information to solve this problem: the next sentence fragment might be “… you know – that vicious schnauzer that lives next door.” But I don’t want to defer the clarification for too long – you have signaled that I am expected to know what dog you’re talking about, and your subsequent speech acts will assume that we’re both now focused on the same entity. When I finally do figure out what dog you mean, I might have to play catch-up on what you’ve been telling me, and if I wait too long my memory of your words may start to overflow or evaporate. (Then again, I might decide that I just don’t care what dog you’re talking about.)

The indefinite article behaves differently. Suppose you say, “While I was walking home today, a dog attacked me.” That’s a signal that you don’t expect me to know what dog you’re talking about. I’m supposed to create a new entity in my mental knowledge base – an instance of the type “dog” – and you’re probably going to tell me some additional information about this creature. (Otherwise, what you’re saying won’t be very interesting.) And, sure enough, such information arrives by the end of this sentence: this just-introduced dog has attacked you. From that I might deduce that you may have been injured, and that maybe we should both choose a different route in the future.

Once this new dog has been introduced and represented in the KB, you can then refer to this particular dog with the definite article: “The dog just wouldn’t stop.” You are now signaling that I should know what dog you mean, and unless there are other dogs in the conversation there should be no confusion.

I may later learn enough that I can equate this anonymous new dog with one I already have represented in my KB – a dog that I may already know something about: “I bet this dog is Chuckles, the ferocious miniature schnauzer, scourge of the neighborhood and relentless enemy of all mail-persons.” In Scone, we would use an EQ-link to equate the new dog-node to the existing “Chuckles” node, effectively merging the two stored descriptions. But at the time you say “a dog” you’re signaling that I’m not expected to identify dog. That will be your assumption as the conversation continues.

So these two little function words play a useful role in shaping descriptions and dialogs in English. Whenever we are referring to an anonymous member of some class, the use of a definite or indefinite article is mandatory in correct English. Of course, this being English, there are many odd exceptions to this rule, probably created to make life difficult for foreigners trying to learn the language. For example, in referring to a specific entity using a proper noun, we normally don’t include an article (“I’m going to Cuba.”) unless the name is some sort of collection (“I’m going to the Virgin Islands.”) – and the exceptions often have their own exceptions. The question of when to include or not to include an article seems to be one of the most difficult aspects of English for non-native speakers to master; the choice of which article to use, when you use one, seems not to be such a big problem.

It is interesting that the use of a definite or indefinite article is not mandatory (and may not even be possible) in other languages. Some languages indicate the same distinction with word order or morphology, but John McWhorter[3] reports that only about 20% of the world’s human languages consistently mark this distinction in an explicit way. In the non-marking languages, the listener may have to do some extra work to figure out whether the dog in question is one he is expected to know about, but usually this can be figured out from other cues; if not, and if the distinction is important, the speaker can use some special marker, often a form of the word “that”, to resolve the ambiguity, but this is not a mandatory part of the grammar.

The point of all this is that these articles, like many other “function words”, have no meaning in the usual sense, but that they provide some sort of guidance for the listener’s mental processing of the utterance. Pronouns and all sorts of other references may trigger similar bursts of processing as we try to make sense of what we are being told or what we are reading.

So, to reiterate: If we want to truly understand an utterance or piece of text, we must pay attention to what each fragment means, and also to what it does.

  1. Some authorities, especially in England, still maintain that we should use “an” with a pronounced but unstressed “h” sound: “This is an historical occasion.” In most dialects of English, the consonant-ness of this “h” is pretty weak, and at some time in the past this “h” sound was not pronounced at all in “accepted” upper-class English speech. In that context, “an historical” was appropriate, and somehow this usage got fossilized. To modern American ears, “an historical” sounds like an hilarious bit of pomposity. []
  2. Interestingly, “the” is pronounced in two ways as well, depending on the context: “thə” (with an unstressed schwa sound) or “thee”, as in “the end”. But most English speakers are not conscious of this distinction because both forms of “the” are spelled the same way. []
  3. The Power of Babyl: A Natural History of Language, John McWhorter, Perennial, 2001, pages 184-185. A fascinating book – highly recommended. []

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>