foundations of computational agents
This section shows how to use definite clauses to represent aspects of the syntax and semantics of natural language.
Languages are defined by their legal sentences. Sentences are sequences of symbols. Here we assume that a sentence is represented as a list of atoms, where there is an atom for each word in the language. More sophisticated models often represent words in terms of their parts, for example removing the ending such as “ing” and “er”.
The legal sentences are specified by a grammar.
Our first approximation of natural language is a context-free grammar. A context-free grammar is a set of rewrite rules, with non-terminal symbols transforming into a sequence of terminal and non-terminal symbols. A sentence of the language is a sequence of terminal symbols generated by such rewriting rules. For example, the grammar rule
means that a non-terminal symbol can be a followed by a . The symbol “” means “can be rewritten as.”
For natural languages, the terminal symbols are typically the words of the language. If a sentence of natural language is represented as a list of words, the following definite clause means that a list of words is a sentence if it is a noun phrase followed by a verb phrase:
To say that the word “computer” is a noun, you could write
There is an alternative, simpler representation of context-free grammar rules using definite clauses that does not require an explicit , known as a definite clause grammar (DCG). Each non-terminal symbol becomes a predicate with two arguments, , which is true when list is an ending of list such that all of the words in before form a sequence of words of the category . Lists and together form a difference list of words that make the class given by the non-terminal symbol, because it is the difference of these that forms the syntactic category.
Under this representation, is true if list is an ending of list such that all of the words in before form a noun phrase. is the rest of the sentence. You can think of as representing a position in a list after position . The difference list represents the words between these positions.
The atomic symbol
is true in the intended interpretation because “the student” forms a noun phrase.
The grammar rule
means that there is a sentence between some and if there exists a noun phrase between and and a verb phrase between and :
This grammar rule can be specified as the clause:
In general, the rule
says that is composed of a followed by a , …, followed by a , and is written as the definite clause
using the interpretation
where the are new variables.
To say that non-terminal gets mapped to the terminal symbols, , one would write
using the interpretation
Thus, is true if .
The rule that specifies that the non-terminal can be rewritten to the non-terminal followed by the non-terminal followed by the terminal symbols and , followed by the non-terminal symbol followed by the terminal symbol and the non-terminal symbol , can be written as
and can be represented as
Note that the translations and were done manually.
% A sentence is a noun phrase followed by a verb phrase.
% A noun phrase is a determiner followed by adjectives followed by a noun followed by an optional prepositional phrase.
% Adjectives consist of a (possibly empty) sequence of adjectives.
% An optional prepositional phrase is either nothing or a preposition followed by a noun phrase.
A verb phrase is a verb followed by a noun phrase and an
Consider the sentence “The student passed the course with a computer.” This is represented as a list of atoms, one for each word.
For the query,
the computer first proves , which has a unique answer, as above. It then tries to prove .
This sentence has two different parses, one using the clause instance
and one using the instance
In the first of these, the prepositional phrase modifies the noun phrase (i.e., the course is with a computer); and in the second, the prepositional phrase modifies the verb phrase (i.e., the course was passed with a computer).