# 13.6.6 Building a Natural Language Interface to a Database

You can augment the preceding grammar to implement a simple natural language interface to a database. The idea is that, instead of transforming sub-phrases into parse trees, you transform them directly into queries on a knowledge base. To do this, make the following simplifying assumptions, which are not always true, but form a useful first approximation:

• nouns and adjectives correspond to properties

• verbs and prepositions correspond to a binary relation between two individuals, the subject and the object.

In this case, a noun phrase becomes an individual with a set of properties defining it. To answer a question, the system can find an individual that has these properties. A noun phrase followed by a verb phrase describes two individuals constrained by the verb.

###### Example 13.39.

In the sentence, “a tall student passed a math course”, the phrase “a tall student” is the subject of the verb “passed” and the phrase “a math course” is the object of the verb. For the individual $S$ that is the subject, $tall(S)$ and $student(S)$ are true. For the individual $O$ that is the object, $course(O)$ and $dept(O,math)$. The verb specifies that $passed(S,O)$. Thus the question “Who is a tall student that passed a math course?” can be converted into the query:

 $\mbox{{ask}~{}}~{}tall(S)\wedge\mbox{}student(S)\wedge\mbox{}passed(S,O)\wedge% \mbox{}course(O)\wedge\mbox{}dept(O,math).$

The phrase “a tall student enrolled in cs312 that passed a math course” could be translated into

 $\displaystyle{\mbox{{ask}~{}}~{}tall(X)\wedge\mbox{}student(X)\wedge\mbox{}% enrolled\_in(X,cs312)\wedge\mbox{}passed(X,O)}$ $\displaystyle\ \ \ \ {\mbox{}\wedge\mbox{}course(O)\wedge\mbox{}dept(O,math).}$

Figure 13.11 shows a simple grammar that parses an English question and answers it at the same time. This ignores most of the grammar of English, such as the differences between prepositions and verbs or between determiners and adjectives, and makes a guess at the meaning, even if the question is not grammatical. Adjectives, nouns and noun phrases refer to an individual. The extra argument to the predicates is an individual which satisfies the adjectives and nouns. Here an $mp$ is a modifying phrase, which could be a prepositional phrase or a relative clause. A $reln$, either a verb or a preposition, is a relation between two individuals, the subject and the object, so these are extra arguments to the $reln$ predicate.

###### Example 13.40.

Suppose $question(Q,A)$ means $A$ is an answer to question $Q$, where a question is a list of words. The following provides some ways questions can be asked from the clauses of Figure 13.11, even given the very limited vocabulary used there.

The following clause allows it to answer questions, such as “Is a tall student enrolled in a computer science course?” and returns the student:

 $\displaystyle{question([is\mid L_{0}],Ind)\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {noun\_phrase(L_{0},L_{1},Ind)\wedge\mbox{}\mbox{}}$ $\displaystyle\ \ \ \ {mp(L_{1},[\,],Ind).}$

The following rule is used to answer questions, such as “Who is enrolled in a computer science course?”, or “Who is enrolled in cs312” (assuming that $course(cs312)$ is true):

 $\displaystyle{question([who,is\mid L_{0}],Ind)\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {mp(L_{0},[\,],Ind).}$

The following rule is used to answer questions, such as “Who is a tall student?”:

 $\displaystyle{question([who,is\mid L],Ind)\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {noun\_phrase(L,[\,],Ind).}$

The following rule allows it to answer questions, such as “Who is tall?”:

 $\displaystyle{question([who,is\mid L],Ind)\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {adjectives(L,[\,],Ind).}$

The following rule can be used to answer questions, such as “Which tall student passed a computer science course?” or even “Which tall student enrolled in a math course passed a computer science course?”:

 $\displaystyle{question([which\mid L_{0}],Ind)\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {noun\_phrase(L_{0},L_{1},Ind)\wedge\mbox{}\mbox{}}$ $\displaystyle\ \ \ \ {mp(L_{1},[\,],Ind).}$

The following rule allows it to answer questions that have “is” between the noun phrase and the modifying phrase, such as “Which tall student is enrolled in a computer science course?” or “Which student enrolled in a math course is enrolled in a computer science course?”:

 $\displaystyle{question([which\mid L_{0}],Ind)\leftarrow\mbox{}}$ $\displaystyle\ \ \ \ {noun\_phrase(L_{0},[is\mid L_{1}],Ind)\wedge\mbox{}\mbox% {}}$ $\displaystyle\ \ \ \ {mp(L_{1},[\,],Ind).}$

The preceding grammar directly found an answer to the natural language question. One problem with this way of answering questions is that it is difficult to separate the cases where the program could not parse the language from the case where there were no answers; in both cases the answer is “no”. This makes it difficult to debug such a program. An alternative is instead of directly querying the knowledge base while parsing, to build a logical form of the natural language – a logical proposition that conveys the meaning of the utterance – before asking it of the knowledge base. The semantic form can be used for other tasks such as telling the system knowledge, paraphrasing natural language, or even translating it into a different language.

You can construct a query by allowing noun phrases to return an individual and a list of constraints imposed by the noun phrase on the individual. Appropriate grammar rules are specified in Figure 13.12, and they are used with the dictionary of Figure 13.13.

In this grammar,

 $noun\_phrase(L_{0},L_{1},O,C_{0},C_{1})$

means that list $L_{1}$ is an ending of list $L_{0}$, and the words in $L_{0}$ before $L_{1}$ form a noun phrase. This noun phrase refers to the individual $O$. $C_{0}$ is an ending of $C_{1}$, and the formulas in $C_{1}$, but not in $C_{0}$, are the constraints on the individual $O$ imposed by the noun phrase.

Procedurally, $L_{0}$ is the list of words to be parsed, and $L_{1}$ is the list of remaining words after the noun phrase. $C_{0}$ is the list of conditions coming into the noun-phrase, and $C_{1}$ is $C_{0}$ with the extra conditions imposed by the noun-phrase added.

###### Example 13.41.

The query

 $\mbox{{ask}~{}}~{}noun\_phrase([a,computer,science,course],[\,],Ind,[\,],C).$

will return

 $C=[course(Ind),dept(Ind,comp\_science)].$

The query

 $\displaystyle{{\mbox{{ask}~{}}~{}noun\_phrase([a,tall,student,enrolled,in,a,% computer,}}$ $\displaystyle\ \ \ \ {science,course],[\,],P,[\,],C).}$

returns

 $\displaystyle{C=[course(X),dept(X,comp\_science),enrolled(P,X),student(P),}$ $\displaystyle\ \ \ \ {tall(P)].}$

If the elements of list $C$ are queried against a database that uses these relations and constants, precisely the tall students enrolled in a computer science course could be found.