language context, ambiguity and inference
This article on today’s MIT front page sketches an argument that I’ve been thinking about for a while, ever since the IBM Watson Jeopardy contest — that natural language processing is hopeless to the extent that there is additional context (side information) not visible to the machine.
Many prominent linguists … have argued that language is … poorly designed for communication [and] … point to the existence of ambiguity: In a system optimized for conveying information…, they argue, each word would have just one meaning. … Now, a group of MIT cognitive scientists has turned this idea on its head … [T]hey claim that ambiguity actually makes language more efficient, by allowing for the reuse of short, efficient sounds that listeners can easily disambiguate with the help of context.
Although this is just talking about homophonic ambiguity at the word level, the same applies to all levels of language, including full messages whose decoding requires potentially deep context.
It isn’t surprising that human speakers use context — ad-hoc or shared understanding — to save effort. It is annoying to have to spell out everything and delightful when the listener understands anyway. It is like a communication problem, but with a power constraint on the encoder and a complexity constraint on the decoder. The listener (decoder) is really trying to choose the most probable meaning from all the possibilities, and the speaker (encoder) attempts to construct — specifically for the decoder’s context — the most modal distribution of meanings where the intended meaning takes the largest probability.
A few thoughts on this. One, successful encoder-decoder pairs synchronize context. Two, viewed in this way, double entendres and certain jokes are not mere oddities of language, but sophisticated communication schemes!
One could go further and argue that there is even a counterpart to steganography, because there is a certain amusement in hiding most of the intended information within the context, while making the direct message without context empty or obfuscating for any other decoder. An encoder that successfully does this knows the decoder well, and a decoder that successfully processes this information possesses either a large amount of shared context with the encoder or a sophisticated inference capability.