# 13.6.1 Using Definite Clauses for Context-Free Grammars

This section shows how to use definite clauses to represent aspects of the syntax and semantics of natural language.

Languages are defined by their legal sentences. Sentences are sequences of symbols. Here we assume that a sentence is represented as a list of atoms, where there is an atom for each word in the language. More sophisticated models often represent words in terms of their parts, for example removing the ending such as “ing” and “er”.

The legal sentences are specified by a grammar.

Our first approximation of natural language is a context-free grammar. A context-free grammar is a set of rewrite rules, with non-terminal symbols transforming into a sequence of terminal and non-terminal symbols. A sentence of the language is a sequence of terminal symbols generated by such rewriting rules. For example, the grammar rule

 $sentence\longmapsto noun\_phrase,verb\_phrase$

means that a non-terminal symbol $sentence$ can be a $noun\_phrase$ followed by a $verb\_phrase$. The symbol “$\longmapsto$” means “can be rewritten as.”

For natural languages, the terminal symbols are typically the words of the language. If a sentence of natural language is represented as a list of words, the following definite clause means that a list of words is a sentence if it is a noun phrase followed by a verb phrase:

 $sentence(S)\leftarrow\mbox{}noun\_phrase(N)\land verb\_phrase(V)\land append(N% ,V,S).$

To say that the word “computer” is a noun, you could write

 $noun([computer]).$

There is an alternative, simpler representation of context-free grammar rules using definite clauses that does not require an explicit $append$, known as a definite clause grammar (DCG). Each non-terminal symbol $s$ becomes a predicate with two arguments, $s(L_{1},L_{2})$, which is true when list $L_{2}$ is an ending of list $L_{1}$ such that all of the words in $L_{1}$ before $L_{2}$ form a sequence of words of the category $s$. Lists $L_{1}$ and $L_{2}$ together form a difference list of words that make the class given by the non-terminal symbol, because it is the difference of these that forms the syntactic category.

###### Example 13.34.

Under this representation, $noun\_phrase(L_{1},L_{2})$ is true if list $L_{2}$ is an ending of list $L_{1}$ such that all of the words in $L_{1}$ before $L_{2}$ form a noun phrase. $L_{2}$ is the rest of the sentence. You can think of $L_{2}$ as representing a position in a list after position $L_{1}$. The difference list represents the words between these positions.

The atomic symbol

 $\displaystyle{noun\_phrase([the,student,passed,the,course,with,a,computer],}$ $\displaystyle\ \ \ \ {[passed,the,course,with,a,computer])}$

is true in the intended interpretation because “the student” forms a noun phrase.

The grammar rule

 $sentence\longmapsto noun\_phrase,verb\_phrase$

means that there is a sentence between some $L_{0}$ and $L_{2}$ if there exists a noun phrase between $L_{0}$ and $L_{1}$ and a verb phrase between $L_{1}$ and $L_{2}$:

 $\overbrace{\underbrace{L_{0}\hskip 85.358268pt}_{noun\_phrase}\underbrace{L_{1% }\hskip 85.358268pt}_{verb\_phrase}}^{sentence}L_{2}$

This grammar rule can be specified as the clause:

 $\displaystyle{sentence(L_{0},L_{2})\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {noun\_phrase(L_{0},L_{1})\wedge\mbox{}}$ $\displaystyle\ \ \ \ {verb\_phrase(L_{1},L_{2}).}$

In general, the rule

 $\displaystyle{h\longmapsto b_{1},b_{2},\ldots,b_{n}}$

says that $h$ is composed of a $b_{1}$ followed by a $b_{2}$, …, followed by a $b_{n}$, and is written as the definite clause

 $\displaystyle{h(L_{0},L_{n})\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {b_{1}(L_{0},L_{1})\wedge\mbox{}}$ $\displaystyle\ \ \ \ {b_{2}(L_{1},L_{2})\wedge\mbox{}}$ $\displaystyle\ \ \ \ {\vdots}$ $\displaystyle\ \ \ \ {b_{n}(L_{n-1},L_{n}).}$

using the interpretation

 $\overbrace{\underbrace{L_{0}\hskip 28.452756pt}_{b_{1}}\underbrace{L_{1}\hskip 2% 8.452756pt}_{b_{2}}L_{2}\cdots\underbrace{L_{n-1}\hskip 28.452756pt}_{b_{n}}}^% {h}L_{n}$

where the $L_{i}$ are new variables.

To say that non-terminal $h$ gets mapped to the terminal symbols, $t_{1},...,t_{n}$, one would write

 $h([t_{1},\cdots,t_{n}\mid T],T)$

using the interpretation

 $\overbrace{t_{1},\cdots,t_{n}}^{h}T$

Thus, $h(L_{1},L_{2})$ is true if $L_{1}=[t_{1},...,t_{n}\mid L_{2}]$.

###### Example 13.35.

The rule that specifies that the non-terminal $h$ can be rewritten to the non-terminal $a$ followed by the non-terminal $b$ followed by the terminal symbols $c$ and $d$, followed by the non-terminal symbol $e$ followed by the terminal symbol $f$ and the non-terminal symbol $g$, can be written as

 $h\longmapsto a,b,[c,d],e,[f],g$

and can be represented as

 $\displaystyle{h(L_{0},L_{6})\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {a(L_{0},L_{1})\wedge\mbox{}}$ $\displaystyle\ \ \ \ {b(L_{1},[c,d\mid L_{3}])\wedge\mbox{}}$ $\displaystyle\ \ \ \ {e(L_{3},[f\mid L_{5}])\wedge\mbox{}}$ $\displaystyle\ \ \ \ {g(L_{5},L_{6}).}$

Note that the translations $L_{2}=[c,d\mid L_{3}]$ and $L_{4}=[f\mid L_{5}]$ were done manually.

Figure 13.7 axiomatizes a simple grammar of English. Figure 13.8 gives a simple dictionary of words and their parts of speech, which can be used with this grammar.

###### Example 13.36.

Consider the sentence “The student passed the course with a computer.” This is represented as a list of atoms, one for each word.

For the grammar of Figure 13.7 and the dictionary of Figure 13.8, the query

 $\mbox{{ask}~{}}~{}noun\_phrase([the,student,passed,the,course,with,a,computer]% ,R).$

will return

 $R=[passed,the,course,with,a,computer].$

For the query,

 $\mbox{{ask}~{}}~{}sentence([the,student,passed,the,course,with,a,computer],[\,% ]).$

the computer first proves $noun\_phrase$, which has a unique answer, as above. It then tries to prove $verb\_phrase$.

This sentence has two different parses, one using the clause instance

 $\displaystyle{verb\_phrase([passed,the,course,with,a,computer],[\,])\leftarrow% \mbox{}}$ $\displaystyle\ \ \ \ {verb([passed,the,course,with,a,computer],}$ $\displaystyle\ \ \ \ {\hskip 60.0pt[the,course,with,a,computer])\wedge\mbox{}}$ $\displaystyle\ \ \ \ {noun\_phrase([the,course,with,a,computer],[\,])\wedge% \mbox{}}$ $\displaystyle\ \ \ \ {pp([\,],[\,])}$

and one using the instance

 $\displaystyle{verb\_phrase([passed,the,course,with,a,computer],[\,])\leftarrow% \mbox{}}$ $\displaystyle\ \ \ \ {verb([passed,the,course,with,a,computer],}$ $\displaystyle\ \ \ \ {\hskip 60.0pt[the,course,with,a,computer])\wedge\mbox{}}$ $\displaystyle\ \ \ \ {noun\_phrase([the,course,with,a,computer],[with,a,% computer])\wedge\mbox{}}$ $\displaystyle\ \ \ \ {pp([with,a,computer],[\,]).}$

In the first of these, the prepositional phrase modifies the noun phrase (i.e., the course is with a computer); and in the second, the prepositional phrase modifies the verb phrase (i.e., the course was passed with a computer).