Linguistics 482 - Computational Linguistics Prolog Notes A. C. Brett acbrett@uvic.ca
Department of Linguistics
University of Victoria
Clearihue C139
Last updated: 25 November 2001

Predicates and Programs

A Prolog program, also known as a database or module, consists of a collection of predicates that define an information domain which enables the Listener to perform a particular task, or collection of related tasks, when appropriate queries are presented.

In these notes, we describe Prolog predicates, including recursive predicates. We also discuss Prolog programs, and their processing by the Listener, by considering the example of a Prolog program that implements of a Declarative Clause Grammar. In this context, the differences between a parser and a recogniser are noted, and top-down versus bottom-up, and depth-first versus breadth-first recognition strategies are also discussed.

Predicates

All rules with head structures that have the same name and the same number of arguments, and all facts consisting of structures with the same name and number of arguments, comprise one predicate. A predicate is identified by the shared name and the number of arguments, also known as the arity, of the head and fact structures.

For example, consider the following lines:

   % Clauses of a lex/3 predicate:
   %
     lex(cat,  n, sg).    lex(cats, n, pl).
     lex(dog,  n, sg).    lex(dogs, n, pl).
     lex(luck, n, sg).

     lex(good,   a,  _).  lex(some,    a,  _).

     lex(this, det, sg).  lex(these, det, pl).
     lex(the,  det,  _).


   % Clauses of a lex/5 predicate:
   %
     lex(she, per, 3, sg, nom).
     lex(her, per, 3, sg, acc).     lex(her, per, 3, sg, gen).

     lex(runs,   v, in, 3, sg).     lex(run,   v, in, 3,  pl).
     lex(brings, v, tr, 3, sg).     lex(bring, v, tr, 3,  pl).
Although all of the foregoing facts consist of structures with the same name, the first nine have three arguments, while the last seven have five arguments. Thus, they represent instances of two different predicates, a lex/3 predicate and a lex/5 predicate. Note also that the classification of these facts into two predicates is independent of the terms in their arguments. Of course, since their arguments differ, they constitute different structures, and hence, they are different facts.

Now consider the following lines:

   % Verb phrase rules consisting of the clauses of vp/2,
   % vp/3, and vp/4 predicates:
   %
     vp(X, AgrS) :- v(X, AgrS).
     vp(X, Y, AgrS) :- v(X, AgrS, AgrO), np(Y, AgrO).
     vp(X, Y, Z, AgrS) :- v(X, AgrS, AgrO), np(Y, Z, AgrO).
which show rules all the head structures of which have the same name, but have different numbers of arguments. Hence, these rules represent examples of different predicates. Note that a rule such as that with the head structure vp(X, AgrS) might constitute the only instance of a rule in a program that has a head structure with the name vp and with two arguments. Hence, it is possible to have a predicate consisting of just one clause.

The following lines illustrate a predicate that consists of two clauses, one of which is a fact, with the other being a rule:

   % The show_list/1 predicate, consisting of the following
   % clauses, displays the elements of a list by invoking the
   % write/1 system predicate.  In order that each element
   % appear on a separate line, the nl/0 system predicate (the
   % "new line" predicate) is invoked.

     show_list([]).

     show_list([X | T]) :- write(X), nl, show_list(T).
The clauses in the foregoing lines constitute a recursive predicate. The second clause is a recursive rule. A predicate is said to be recursive if it includes at least one recursive rule. A rule is said to be recursive if its body contains a goal with the same name and number of arguments as its head structure. Thus, in proving this goal, the Listener will re-use, or iterate, the rule.

The argument of the head of the recursive rule in the foregoing example is a list, [X | T], represented in open or indefinite format. During each iteration in the course of a proof that uses this rule, the first element of a list, called the head of the list, will be instantiated on the variable X, and the remaining elements, called the tail of the list, will be instantiated on the variable T.

In the body of the rule, the write/1 system predicate is invoked to display the term that has been instantiated on X. The nl/0 predicate is invoked so that the next term will be displayed on a new line. The goal show_list(T) is then proved. During the first iteration of the recursive rule, T is instantiated with the original list with its first element removed.

For example, if the goal show_list([the, cat, runs]) is typed, X will be instantiated with the, and T will be instantiated with [cat, runs]. The atom the will be displayed, and the goal show_list([cat, runs]) will be proved.

During the proof of the goal show_list([cat, runs]), the recursive rule is used again, and unification of the goal with the head of the rule causes cat to be instantiated on X, and [run] to be instantiated on T. Consequently, cat is displayed, and the goal show_list([runs]) is proved.

In proving the goal show_list([runs]), runs is instantiated on X and [] is instantiated on T, where [] represents the empty or null list. Hence, runs is displayed and the goal show_list([]) is proved.

The goal show_list([]), however, unifies with the structure constituting the fact that is the first clause of the show_list/1 predicate. Since this clause yields a proof of the goal, the second clause, which is the recursive rule, is not tested, and the recursive iteration over the elements of the original list comes to an end.

The clause that ends a recursive iteration is called the ground case clause. Since the ground case fact precedes the recursive rule, it is actually tested first, before the recursive rule, during each iteration. Until the list is empty, however, the test fails, and the recursive rule is used.

Any recursive predicate must include a ground case that precedes, and hence, is tested before the recursive goal is proved; otherwise, the predicate can iterate indefinitely, or at least until time and space limits are exceeded. In this particular example, if the ground case fact show_list([]) is left out, the recursive predicate will simply fail when the term instantiated on the argument of the head of the recursive rule is no longer a list with at least one element.

Context-free grammars for natural languages can include recursive rewriting rules, such as VP --> V VP and NP --> NP PP, the right-hand sides of which include the left-hand side categories. The following Prolog clauses include a recursive rule that is a translation of the recursive rewriting rule NP --> A NP. This rule can license noun phrases that include sequences of indefinitely many adjectives. The ground case Prolog rule is a translation of the rewriting rule NP --> A N. This rule licenses a noun phrase that consists of an adjective followed by a noun.

     % np/3 : NP --> A N  (Ground case rule)
     %
          np([A, N | T], T, AGR) :- a(A, AGR), n(N, AGR).

     % np/3 : NP --> A NP  (Recursive rule)
     %
          np([A | NP], T, AGR) :- a(A, AGR), np(NP, T, AGR).

     % a/2 : A --> word
     %
          a(Word, AGR) :- lex(Word, a, AGR).

     % n/2 : N --> word
     %
          n(Word, AGR) :- lex(Word, n, AGR).

     % lex/3 : words
     %
          lex(dog,  n, sg).    lex(dogs, n, pl).

          lex(tiny,  a,  _).   lex(little, a,  _).
          lex(happy, a,  _).   lex(brown,  a,  _).
          lex(quick, a,  _).   lex(young,  a,  _).

If a file containing the foregoing predicates is consulted, the Listener will respond "yes" to a query such as

     np([tiny, quick, happy, little, young, brown, dog], [], Agr).

Programs

A Prolog program is a collection of predicates that define an information domain. This domain is specified by the facts in the program, together with the rules that establish relationships among these facts. The information domain thereby specified enables the Listener to perform a particular task, or a collection of related tasks, when an appropriate claim or query is presented.

One of these tasks might consist of the determination that a given sequence of words from a particular language is, or is not, a sentence in the language, when a claim is presented to the Listener that the word sequence is a sentence. A Prolog program that causes the Listener to perform a task such as this is called a recogniser. If the program also enables the Listener to produce a phrase marker or structure for a word sequence recognised as a sentence, then the program is called a parser. Note, however, that it is the Listener that performs the actual operations of a recogniser or parser for the language, based upon information about the language specified in a Prolog program.

The information provided by the Prolog program that enables the Listener to function as a recogniser or parser normally includes a collection of facts that identify the lexicon of the language, and perhaps specify values of the features that characterise the words. The information domain will also include a syntax for the language that might be defined in terms of rules in the Prolog program. These rules specify the relationships among the words that determine how they might be combined to form sentences.

Since it is possible to transcribe or translate the rewrite rules of a context-free phrase-structure grammar into Prolog rules, a Prolog program can be identified as a grammar, or at least as implementing a grammar. Such a grammar is called a variety of Declarative Clause Grammar (DCG).

A Prolog program that comprises or implements a DCG for a language causes the Listener to function as a top-down recogniser or parser. A top-down recogniser begins the recognition of a putative sentence by expanding the root or axiom symbol of the grammar with a rule such as S --> NP VP. It then expands the right-hand side symbols with rules such as NP --> DET N and VP --> V NP. The recogniser proceeds in this fashion, changing the rules it expands as necessary, until the right-hand sides of the rules include lexical category symbols that match the category labels of the lexical items comprising the putative sentence.

The fact that a Prolog program which implements a DCG does cause the Listener to perform a top-down recognition can be seen by examining the rules in the following lines:

   % S --> NP VP
     s(L) :- np(L, T, Agr), vp(T, [], Agr).

   % NP --> PRO
     np([X | T], T, Agr) :- pro(X, Agr).

   % NP --> DET N
     np([X, Y | T], T, Agr) :- det(X, Agr), n(Y, Agr).

   % NP --> A N
     np([X, Y | T], T, Agr) :- a(X, Agr), n(Y, Agr).

   % VP --> V NP
     vp([X | T], U, AgrS) :- v(X, AgrS, AgrO), np(T, U, AgrO).

   % VP --> V
     vp([X | T], U, AgrS) :- v(X, AgrS).

   % PRO --> word
     pro(X, agr(PER, NUM, CAS)) :- lex(X, per, PER, NUM, CAS).

   % N --> word
     n(X, agr(3, NUM, _)) :- lex(X, n, NUM).

   % DET --> word
     det(X, agr(3, NUM, _)) :- lex(X, det, NUM).

   % A --> word
     a(X, agr(3, NUM, _)) :- lex(X, a, NUM).

   % V --> word
     v(X, agr(PER, NUM, nom)) :- lex(X, v, in, PER, NUM).

   % V --> word
     v(X, agr(PER, NUM, nom), agr(_, _, acc)) :- lex(X, v, tr, PER, NUM).
wherein the lex/3 and lex/5 goals might be proved by the corresponding facts described at the beginning of this note. Now consider the procedure followed by the Listener as it undertakes to prove a goal such as s([these, cats, bring, good, luck]).

This goal can be construed as a claim that the list of words in the argument of the s/1 predicate comprises a sentence. The Listener starts the process of proving the goal by unifying it with the head of the s/1 predicate rule, which corresponds to starting the recognition with the root symbol of the grammar. The Listener then attempts to prove the goals in the body of the rule, which corresponds to the process of expanding the right-hand side symbols of a rewrite rule by applying other rewrite rules. The procedure ends when goals can be proved by unifying them with facts that specify words which match the words in the putative sentence. In this illustration, provided that the appropriate facts are included in the program, the Listener will confirm that the sequence of words constitutes a sentence.

Prolog programs can be written that cause the Listener to perform a bottom-up recognition or parse. A bottom-up recognition starts with lexical rewrite rules to obtain the categories of the words of a putative sentence. It then applies the right-hand sides of other rewrite rules to contract or collapse sequences of category labels to the one label on the left-hand side of the rules. The recognition proceeds in this manner until the sequences of labels can be reduced to the one label that is the root symbol of the grammar.

There are a number of different strategies that can be implemented as Prolog programs which will cause the Listener to work as a bottom-up recogniser or parser. Some of these strategies entail a transformation of the original grammar, while others incorporate general mechanisms for applying rewrite rules. In the latter strategies, the rewrite rules of a grammar are not represented as Prolog rules. Rather, the rewrite rules are often represented in facts, and it is the mechanism for applying them that is represented in terms of Prolog rules. Thus, according to either strategy, the principal characteristic of a DCG is lost; that is, there is no longer a direct correspondence between Prolog rules and the rewrite rules of the grammar.

A Prolog program that implements a DCG causes the Listener to perform a depth-first recognition or parse. A depth-first recognition can be conducted in either a top-down or a bottom-up direction. A top-down recogniser that works depth-first always expands the left-most right-hand side symbol of each rewrite rule as soon as the the rule is applied. It continues this process until can match words in a putative sentence. Only then will it expands the next symbol on the right-hand side of a rewrite rule.

                         S
                        / \
                       /   \
                     NP     VP
                    /  \
                   /    \
                 DET     N
                  |
                these
For example, the recogniser will expand the symbol NP on the right-hand side of the rewrite rule S --> NP VP by applying a rule such as NP --> DET N. It will then match DET with the category label of the first word in the putative sentence. If it obtains a match, the recogniser will attempt to match the category N with the lexical category of the next word. If the recogniser succeeds in this attempt, it will go back to expand the VP symbol.

In a breadth-first recognition or parse, the processor, after after expanding the symbol NP on the right-hand side of the rewrite rule S --> NP VP, will expand the VP symbol by applying a rewrite rule such as VP --> V NP, before attempting to match the symbols DET and N on the right-hand side of the NP --> DET N rewrite rule.

                          S
                        _/ \_
                       /     \
                     NP       VP
                     /\       /\
                    /  \     /  \
                  DET   N   V   NP

In these illustrations of both the depth- and breadth-first processes, the recogniser or parser performs a left-to-right recognition of a putative sentence. The right-hand sides of each rewrite rule are expanded beginning with the left-most symbol, followed by the other right-hand side symbol, or symbols, in the left to right direction. Consequently, regardless whether the recogniser works according to a depth- or breadth-first strategy, the words of a putative sentence are recognised beginning with the first, or left-most word, with subsequent words being recognised in the order of their appearance in the putative sentence from left to right.

That a Prolog program which implements a DCG causes the Listener to perform a left-to-right recognition or parse becomes evident when one recalls that the goals in the body of a rule are proved in the order of their appearance from left to right. Also, this process is depth-first, rather than breadth-first, because the Listener must obtain a proof for a goal before it can go on to prove the next goal in the body of a rule. In order to prove a goal, the Listener must unify it with a fact, or with the head of another rule. If the goal unifies with the head of a rule, the Listener must then prove the goals in its body, and so on. Thus, the process is depth-first, in addition to being left-to-right, and top-down.

Linguistics 482 Prolog Introductory Notes Top of Page