The question might seem easy.

The Economist 4-9-2014

One answer is that the list of all words in a language can be found in a dictionary. A second, commonsense definition, might be that everything that appears between spaces on a written page (such as this one) is a word. A third idea might be that words are the unsplittable building blocks of a language.

It might then seem surprising that for linguists—the academics who ponder what language is for a living—the definition of a word is not at all clear. It would surprise the average reader that many linguists do not much care for the idea of “words” as such. All three commonsense definitions above are so flawed as to be unusable.

To take the first, not all our words—by a very long shot—are found in “the dictionary”. There is, first, no Dictionary. There are only many dictionaries for the English language, put out by private publishers. Their flawed, human lexicographers ponder daily what to include and what to exclude. (Huge numbers of specialised words don’t make the cut.) This was the subject of a recent Economist explainer article, responding to those nonplussed that Oxford Dictionaries had recently included “side boob” and “neckbeard” in their online dictionaries. The idea that lexicographers can, by the power invested in them by Oxford University Press or Merriam-Webster, allow a string of letters to “become a word” elicits a wry laugh from real lexicographers. They no more determine what words are than chemists get to choose what the elements are. They find and describe words, but do not permit them into being.

The second notion is that words are delineated by spaces on the printed page in proper languages. But what about “side boob”, above? One word or two? Oxford Dictionaries included it as a new word, not a new phrase. This makes sense because sometimes new “words” are created from old elements. Whether writers write them with a space (“side boob”) or not (“neckbeard”) is immaterial. For “word” status, it matters whether the new creation is truly something new. Remember school chemistry: a mixture is when two substances mingle but retain their chemical properties. A compound is when the properties change as a result of the mixture. By analogy, a “dark room” is a mixture; a “darkroom” is a compound. A “black board” must be black, but a “blackboard” may be green, as Steven Pinker has written.

A neat feature of English helps disentangle mere phrases from new compounds. We know that two words have become a compound—something new—when the stress shifts to the first bit of the compound. Consider “blackboard” and “darkroom” above, or the difference between “log in”—There’s a log in the shed—and “login”—Have you forgotten your login again? The clear first-syllable stress shows that English-speakers have begun considering login a compound. The orthographic change—whether there’s a space on either side—is what economists might call a trailing indicator. New coinages might first be written as two separate words; as the combination persists, it might be hyphenated. If it survives long enough, it will probably end up being written closed up. Your columnist discovered this as a university student poring over dusty decades’ worth of an old reference publication. It was called the Statistical Year Book, then Year-Book, then Yearbook.

Surely there must be something unsplittable and basic to the language, though, might come the response. Of course a language’s sounds are even more basic than the words. But what about a basic unit of meaning? Is that a word? No, because even many indubitable “words” in the common understanding can be split into meaningful hunks. Unsuitable has three meaningful units, un-, -suit- and –able. Each can be used in other combinations. (UneaseThat suits the mood, playable old records). No one would argue that “unsuitable” is “not a word”. So our “basic unit of meaning” must be smaller in grain than the word. And indeed, linguists call the smallest unit of meaning a “morpheme”. Some morphemes are words (like “suit”). But some are not (like “un-“). Dictionaries might include common compounds like unsuitable, but they will also omit many obvious but unnecessary entries like sharklike and smokable.

All this adds up to a curious fact. While amateur language-lovers are often word-lovers, learning and treasuring oddities like antediluvian or triskaidekaphobia, linguists think both bigger and smaller than “words”. To the linguists, fussing over words is like developing an interest in individual chemical elements: “I’m fascinated by boron.” In the material world, what really matters is how elements are built (the stuff of subatomic physics) and how they work together (the stuff of chemistry). So it is with language. Commonsense though it may seem to many, “word” is not an interesting unit, if you’re interested in the system rather than its pieces. Morphemes interact to make words (their study is called morphology). Words interact to make phrases and clauses and sentences (the process of which is called syntax). Those in turn transmit meaning (semantics). In language as in so many other things, the whole really is not just greater, but far more fun, than the sum of the parts.