Artificial Intelligence - foundations of computational agents -- 12.6 Applications in Natural Language Processing

Third edition of Artificial Intelligence: foundations of computational agents, Cambridge University Press, 2023 is now available (including the full text).

12.6 Applications in Natural Language Processing

Natural language processing is an interesting and difficult domain in which to develop and evaluate representation and reasoning theories. All of the problems of AI arise in this domain; solving "the natural language problem" is as difficult as solving "the AI problem" because any domain can be expressed in natural language. The field of computational linguistics has a wealth of techniques and knowledge. In this book, we can only give an overview.

There are at least three reasons for studying natural language processing:

You want a computer to communicate with users in their terms; you would rather not force users to learn a new language. This is particularly important for casual users and those users, such as managers and children, who have neither the time nor the inclination to learn new interaction skills.
There is a vast store of information recorded in natural language that could be accessible via computers. Information is constantly generated in the form of books, news, business and government reports, and scientific papers, many of which are available online. A system requiring a great deal of information must be able to process natural language to retrieve much of the information available on computers.
Many of the problems of AI arise in a very clear and explicit form in natural language processing and, thus, it is a good domain in which to experiment with general theories.

The development of natural language processing provides the possibility of natural language interfaces to knowledge bases and natural language translation. We show in the next section how to write a natural language query answering system that is applicable to very narrow domains for which stylized natural language is adequate and in which little, if any, ambiguity exists. At the other extreme are shallow but broad systems, such as the help system presented in Example 6.16 and Example 7.13. Example 7.13. Developing useful systems that are both deep and broad is difficult.

There are three major aspects of any natural language understanding theory:

Syntax: The syntax describes the form of the language. It is usually specified by a grammar. Natural language is much more complicated than the formal languages used for the artificial languages of logics and computer programs.
Semantics: The semantics provides the meaning of the utterances or sentences of the language. Although general semantic theories exist, when we build a natural language understanding system for a particular application, we try to use the simplest representation we can. For example, in the development that follows, there is a fixed mapping between words and concepts in the knowledge base, which is inappropriate for many domains but simplifies development.
Pragmatics: The pragmatic component explains how the utterances relate to the world. To understand language, an agent should consider more than the sentence; it has to take into account the context of the sentence, the state of the world, the goals of the speaker and the listener, special conventions, and the like.

To understand the difference among these aspects, consider the following sentences, which might appear at the start of an AI textbook:

This book is about artificial intelligence.
The green frogs sleep soundly.
Colorless green ideas sleep furiously.
Furiously sleep ideas green colorless.

The first sentence would be quite appropriate at the start of such a book; it is syntactically, semantically, and pragmatically well formed. The second sentence is syntactically and semantically well formed, but it would appear very strange at the start of an AI book; it is thus not pragmatically well formed for that context. The last two sentences are attributed to linguist Noam Chomsky (1957). The third sentence is syntactically well formed, but it is semantically non-sensical. The fourth sentence is syntactically ill formed; it does not make any sense - syntactically, semantically, or pragmatically.

In this book, we are not attempting to give a comprehensive introduction to computational linguistics. See the references at the end of the chapter for such introductions.