15 Individuals and Relations 15.5 Proofs and Substitutions 15.7 Applications in Natural Language

15.6 Function Symbols and Data Structures

Datalog requires a name, using a constant, for every individual about which the system reasons. Often it is simpler to identify an individual in terms of its components, rather than requiring a separate constant for each individual.

Example 15.23.

In many domains, you want to be able to refer to a time as an individual. You may want to say that some course is held at 11:30 a.m. You do not want a separate constant for each possible time, although this is possible. It is better to define times in terms of, say, the number of hours past midnight and the number of minutes past the hour. Similarly, you may want to reason with facts about particular dates. You cannot give a constant for each date arbitrarily far in the future, as there are infinitely many possible dates. It is easier to define a date in terms of the year, the month, and the day.

Using a constant to name each individual means that the knowledge base can only represent a finite number of individuals, and the number of individuals is fixed when the knowledge base is built. However, you may want to reason about a potentially infinite set of individuals.

Example 15.24.

Suppose you want to build a system that takes questions in English and answers them by consulting an online database. In this case, each sentence is an individual. You do not want to have to give each sentence its own name, because there are too many English sentences to name them all. It may be better to name the words and then to specify a sentence in terms of the sequence of words in the sentence. This approach may be more practical because there are far fewer words to name than sentences, and each word has its own natural name. You may also want to specify the words in terms of the letters in the word or in terms of their constituent parts.

Function symbols allow you to describe individuals indirectly. Rather than using a constant to describe an individual, an individual is described in terms of other individuals. Function symbols enable the language to represent rich data structures; a function symbol gives a structured collection of other entities.

Syntactically a function symbol is a word starting with a lower-case letter. The definition of a term is extended so that a term is either a variable, a constant, or of the form $f(t_{1},\ldots,t_{n})$ , where $f$ is a function symbol and each $t_{i}$ is a term. Apart from extending the definition of terms, the language stays the same.

Terms only appear within predicate symbols. You do not write clauses that imply terms. You may, however, write clauses that include atoms that use function symbols.

The semantics of Datalog must be expanded to reflect the new syntax. The definition of $\phi$ is extended so that $\phi$ also assigns to each n-ary function symbol a function from $D^{n}$ into $D$ . A constant can be seen as a 0-ary function symbol (i.e., one with no arguments). Thus, $\phi$ specifies which individual is denoted by each ground term.

Example 15.25.

Suppose you want to define dates, such as 20 July 1969, which is the date the first time a human was on the moon. You can use the function symbol $c e$ (common era), so that $ce(Y,M,D)$ denotes a date with year $Y$ , month $M$ , and day $D$ . For example, $ce(1969,jul,20)$ may denote 20 July 1969. Similarly, you can define the symbol $b c e$ to denote a date before the common era.

The only way to use the function symbol is to write clauses that define relations using the function symbol. There is no notion of defining the $c e$ function; dates are not in a computer any more than people are.

To use function symbols, you can write clauses that are quantified over the arguments of the function symbol. For example, Figure 15.7 defines the $before(D_{1},D_{2})$ relation that is true if date $D_{1}$ is before date $D_{2}$ in a day.

% $before(D_{1},D_{2})$ is true if date $D_{1}$ is before date $D_{2}$

before(ce(Y1,M1,D1),ce(Y2,M2,D2))\leftarrow\mbox{}

Y1<Y2.

before(ce(Y,M1,D1),ce(Y,M2,D2))\leftarrow\mbox{}

month(M1,N1)\wedge\mbox{}

month(M2,N2)\wedge\mbox{}

N1<N2.

before(ce(Y,M,D1),ce(Y,M,D2))\leftarrow\mbox{}

D1<D2.

% $month(M,N)$ is true if month $M$ is the $N$ th month of the year

\begin{array}[]{lll}{month(jan,1).}&{month(feb,2).}&{month(mar,3).}\\ {month(apr,4).}&{month(may,5).}&{month(jun,6).}\\ {month(jul,7).}&{month(aug,8).}&{month(sep,9).}\\ {month(oct,10).}&{month(nov,11).}&{month(dec,12).}\end{array}

Figure 15.7: Axiomatizing a “before” relation for dates in the common era

This assumes the predicate “ $<$ ” represents the relation “less than” between integers. This could be represented in terms of clauses, but is often predefined, as it is in Prolog. The months are represented by constants that consist of the first three letters of the month.

A knowledge base consisting of clauses with function symbols can compute any computable function. Thus, a knowledge base can be interpreted as a program, called a logic program. Logic programs are Turing complete; they can compute any function computable on a digital computer.

This expansion of the language has a major impact. With just one function symbol and one constant, the language contains infinitely many ground terms and infinitely many ground atoms. The infinite number of terms can be used to describe an infinite number of individuals.

Function symbols are used to build data structures, as in the following example.

Example 15.26.

A tree is a useful data structure. You could use a tree to build a syntactic representation of a sentence for a natural language processing system. You could decide that a labeled tree is either of the form $node(N,LT,RT)$ or of the form $leaf(L)$ . Thus, $n o d e$ is a function from a name, a left tree, and a right tree into a tree. The function symbol $l e a f$ denotes a function from the label of a leaf node into a tree.

The relation $at\_leaf(L,T)$ is true if label $L$ is the label of a leaf in tree $T$ . It can be defined by

at\_leaf(L,leaf(L)).

at\_leaf(L,node(N,LT,RT))\leftarrow\mbox{}

at\_leaf(L,LT).

at\_leaf(L,node(N,LT,RT))\leftarrow\mbox{}

at\_leaf(L,RT).

This is an example of a structural recursive program. The rules cover all of the cases for each of the structures representing trees.

The relation $in\_tree(L,T)$ , which is true if label $L$ is the label of an interior node of tree $T$ , can be defined by

in\_tree(L,node(L,LT,RT)).

in\_tree(L,node(N,LT,RT))\leftarrow\mbox{}

in\_tree(L,LT).

in\_tree(L,node(N,LT,RT))\leftarrow\mbox{}

in\_tree(L,RT).

Example 15.27.

A list is an ordered sequence of elements. You can reason about lists using just function symbols and constants, without the notion of a list being predefined in the language. A list is either the empty list or an element followed by a list. You can invent a constant to denote the empty list. Suppose you use the constant $n i l$ to denote the empty list. You can choose a function symbol, say $cons(Hd,Tl)$ , with the intended interpretation that it denotes a list with first element $H d$ and rest of the list $T l$ . The list containing the elements $a$ , $b$ , $c$ would then be represented as

cons(a,cons(b,cons(c,nil))).

To use lists, one must write predicates that do something with them. For example, the relation $append(X,Y,Z)$ that is true when $X$ , $Y$ , and $Z$ are lists, such that $Z$ contains the elements of $X$ followed by the elements of $Y$ , can be defined recursively by

append(nil,L,L).

append(cons(Hd,X),Y,cons(Hd,Z))\leftarrow\mbox{}

append(X,Y,Z).

There is nothing special about $c o n s$ or $n i l$ ; you could have just as well used $f o o$ and $b a r$ .

First-Order and Second-Order Logic

First-order predicate calculus is a logic that extends propositional calculus to include atoms with function symbols and logical variables. All logical variables must have explicit quantification in terms of “for all” ( $\forall$ ) and “there exists” ( $\exists$ ). The semantics of first-order predicate calculus is like the semantics of logic programs presented in this chapter, but with a richer set of operators.

The language of logic programs forms a pragmatic subset of first-order predicate calculus, which has been developed because it is useful for many tasks. First-order predicate calculus can be seen as a language that adds disjunction and explicit quantification to logic programs.

First-order logic is first order because it allows quantification over individuals in the domain. First-order logic allows neither predicates as variables nor quantification over predicates.

Second-order logic allows for quantification over first-order relations, and allows for predicates whose arguments are first-order relations. These are second-order relations. For example, the second-order logic formula

\forall R\ symmetric(R)\leftrightarrow(\forall X\forall Y\ R(X,Y)\rightarrow R% (Y,X))

defines the second-order relation $s y m m e t r i c$ , which is true if its argument is a symmetric relation.

Second-order logic seems necessary for many applications because transitive closure is not first-order definable. For example, suppose you want $b e f o r e$ to be the transitive closure of $n e x t$ , where $next(X,s(X))$ is true. Think of $n e x t$ meaning the “next millisecond” and $b e f o r e$ denoting “before.” The natural first-order definition would be the definition

\forall X\forall Y\ before(X,Y)\leftrightarrow\left(Y=s(X)\vee before(s(X),Y)% \right).

(15.1)

This expression does not accurately capture the definition, because, for example:

\forall X\forall Y\ before(X,Y)\rightarrow\exists W\ Y=s(W)

does not logically follow from Formula 15.1, because there are nonstandard models of Formula 15.1 with $Y$ denoting infinity. To capture the transitive closure, you require a formula stating that $b e f o r e$ is the minimal predicate that satisfies the definition. This can be stated using second-order logic.

First-order logic is semi-decidable, which means that a sound and complete proof procedure exists in which every true statement can be proved, but it may not halt. Second-order logic is undecidable; no sound and complete proof procedure can be implemented on a Turing machine.

15.6.1 Proof Procedures with Function Symbols

The proof procedures with variables carry over for the case with function symbols. The main difference is that the class of terms is expanded to include function symbols.

The use of function symbols involves infinitely many terms. To be complete, forward chaining on the clauses has to ensure that the selection criterion for selecting clauses is fair.

Example 15.28.

To see why fairness is important, consider the following clauses:

num(0).

num(s(N))\leftarrow\mbox{}num(N).

a\leftarrow\mbox{}b.

b .

An unfair strategy could initially select the first clause to forward chain on and, for every subsequent selection, select the second clause. The second clause can always be used to derive a new consequence. This strategy never selects either of the last two clauses and thus never derives $a$ or $b$ .

This problem of ignoring some clauses forever is known as starvation. A fair selection criterion is one such that any clause available to be selected will eventually be selected. The bottom-up proof procedure can generate an infinite sequence of consequences and if the selection is fair, each consequence will eventually be generated and so the proof procedure is complete.

The top-down proof procedure is the same as for Datalog (see Figure 15.5). Unification becomes more complicated, because it must recursively descend into the structure of terms. There is one change to the unification algorithm: a variable $X$ does not unify with a term $t$ in which $X$ occurs and is not $X$ itself. The algorithm should return $\bot$ if $X/t$ is to be added to $S$ where variable $X$ occurs in term $t$ . Checking for this condition is known as the occurs check. The occurs check is sometimes omitted (e.g., in Prolog), because removing it makes the proof procedure more efficient, even though removing it makes the proof procedure unsound, as shown in the following example.

Example 15.29.

Consider the knowledge base with only one clause:

lt(X,s(X)).

Suppose the intended interpretation is the domain of integers in which $l t$ means “less than” and $s(X)$ denotes the integer after $X$ . The query $\mbox{{ask}~}~lt(Y,Y)$ should fail because it is false in the intended interpretation; there is no number less than itself. However, if $X$ and $s(X)$ could unify, this query would succeed. In this case, the proof procedure would be unsound because something could be derived that is false in a model of the axioms.

The following example shows the details of SLD resolution with function symbols.

Example 15.30.

Consider the clauses

append(c(A,X),Y,c(A,Z))\leftarrow\mbox{}

append(X,Y,Z).

append(nil,Z,Z).

For now, ignore what this may mean. Like the computer, treat this as a problem of symbol manipulation. Consider the following query:

\mbox{{ask}~}~append(F,c(L,nil),c(l,c(i,c(s,c(t,nil))))).

The following is a derivation:

yes(F,L)\leftarrow\mbox{}append(F,c(L,nil),c(l,c(i,c(s,c(t,nil)))))

\hbox{resolve with }append(c(A_{1},X_{1}),Y_{1},c(A_{1},Z_{1}))\leftarrow\mbox% {}append(X_{1},Y_{1},Z_{1})

\hbox{substitution: }{\{F/c(l,X_{1}),Y_{1}/c(L,nil),A_{1}/l,Z_{1}/c(i,c(s,c(t,% nil)))\}}

yes(c(l,X_{1}),L)\leftarrow\mbox{}append(X_{1},c(L,nil),c(i,c(s,c(t,nil))))

\hbox{resolve with }append(c(A_{2},X_{2}),Y_{2},c(A_{2},Z_{2}))\leftarrow\mbox% {}append(X_{2},Y_{2},Z_{2})

\hbox{substitution: }{\{X_{1}/c(i,X_{2}),Y_{2}/c(L,nil),A_{2}/i,Z_{2}/c(s,c(t,% nil))\}}

yes(c(l,c(i,X_{2})),L)\leftarrow\mbox{}append(X_{2},c(L,nil),c(s,c(t,nil)))

\hbox{resolve with }append(c(A_{3},X_{3}),Y_{3},c(A_{3},Z_{3}))\leftarrow\mbox% {}append(X_{3},Y_{3},Z_{3})

\hbox{substitution: }{\{X_{2}/c(s,X_{3}),Y_{3}/c(L,nil),A_{3}/s,Z_{3}/c(t,nil)\}}

yes(c(l,c(i,c(s,X_{3}))),L)\leftarrow\mbox{}append(X_{3},c(L,nil),c(t,nil))

At this stage both clauses are applicable. Choosing the first clause gives

\hbox{resolve with }append(c(A_{4},X_{4}),Y_{4},c(A_{4},Z_{4}))\leftarrow\mbox% {}append(X_{4},Y_{4},Z_{4})

\hbox{substitution: }{\{X_{3}/c(t,X_{4}),Y_{4}/c(L,nil),A_{4}/t,Z_{4}/nil\}}

yes(c(l,c(i,c(s,X_{3}))),L)\leftarrow\mbox{}append(X_{4},c(L,nil),nil).

At this point, there are no clauses whose head unifies with the atom in the generalized answer clause’s body. The proof fails.

Choosing the second clause instead of the first gives

\hbox{resolve with }append(nil,Z_{5},Z_{5})

\hbox{substitution: }{\{Z_{5}/c(t,nil),X_{3}/nil,L/t\}}

yes(c(l,c(i,c(s,nil))),t)\leftarrow\mbox{}.

At this point, the proof succeeds, with answer $F=c(l,c(i,c(s,nil)))$ , $L=t$ .

For the rest of this chapter, the “syntactic sugar” notation of Prolog for representing lists is used. The empty list, $n i l$ , is written as $[\,]$ . The list with first element $E$ and the rest of the list $R$ , which is $cons(E,R)$ in Example 15.27, is written as $[E\mid R]$ . There is one other notational simplification: $[X\mid[Y]]$ is written as $[X,Y]$ , where $Y$ can be a sequence of values. For example, $[a\mid[\,]]$ is written as $[a]$ , and $[b\mid[a\mid[\,]]]$ is written as $[b,a]$ . The term $[a\mid[b\mid C]]$ is written as $[a,b\mid C]$ .

Example 15.31.

Using the list notation, $a p p e n d$ from the previous example can be written as

append([A\mid X],Y,[A\mid Z])\leftarrow\mbox{}

append(X,Y,Z).

append([\,],Z,Z).

The query

\mbox{{ask}~}~append(F,[L],[l,i,s,t])

has an answer $F=[l,i,s]$ , $L=t$ . The proof is exactly as in the previous example. As far as the proof procedure is concerned, nothing has changed; there is just a renamed function symbol and constant.

Artificial Intelligence 3E