% Linguistics 484 -- Example: TD_BFP.TXT % % Production Systems: A Top-Down, Breadth-First Parser. % % This example illustrates a Production System that performs a top-down, % breadth-first parse of a sentence. The purpose of this example, aside % from showing a top-down parser that functions breadth-first, rather % than depth-first, is to demonstrate the parsing of a structurally % ambiguous sentence. A sentence is said to be structurally ambiguous % if it has more than one structure or analysis; however, because the % Interpreter tests rules and proves goals in the order of their % appearance in the program, and stops when it succeeds in identifying % a string as a sentence, the Interpreter will normally produce only % one structure for a structurally ambiguous sentence. Alternative % structures or analyses can be obtained by changing the order of the % facts or rules comprising the Prolog program. % %s/1 % % s(Data) : The list of tokens in Data constitutes a sentence if all % the tokens in Data are consumed, and a phrase marker is % returned at Parse. % s( Data ) :- test_rule( s, Data, [], Parse ), nl, write( Parse ), nl, nl. %test_rule/4 % % test_rule( LHS, Data, Data_i, Tree ) : A rule with left-hand side LHS % (and with right-hand side RHS) is % selected, if it can be applied, and as a % consequence, the initial data tokens in % Data are consumed, leaving the remaining % tokens in Data_i, and a (partial) phrase % marker at Tree. % % Although the recognition proceeds top-down, the phrase marker is % constructed bottom-up. Daughter subtrees are returned from the % apply/4 predicate at Tree_i. These partial phrase markers are stored % as Prolog lists. These lists are themselves elements of a list. In % its fourth argument, test_rule/4 returns (to the assertion which tests % it) this list of daughter subtrees with the label of their mother % category, LHS, as its head. % test_rule( LHS, Data, Data_i, [LHS | Tree_i] ) :- rule( LHS, RHS ), apply( RHS, Data, Data_i, Tree_i ) . %apply/4 % % apply( RHS, Data, Data_i, Tree ) : A rule with the right-hand side % symbols RHS is (or, has been) applied, % with the initial data tokens in Data being % consumed, leaving the remaining tokens in % Data_i, and with a (partial) phrase marker in % Tree, if -- % % - all the right-hand side symbols are matched; or, % % - the right-hand side symbol matches the initial data % token; or, % % - a right-hand side symbol matches the left-hand side of a % production which can be applied, and the other right-hand % symbols result in other productions being selected. % % The apply/4 predicate iterates over the right-hand symbols of a % production. This iteration is effected by the third assertion, which % is a recursive rule. The first goal in the body of the third % assertion is a test of the test_rule/4 predicate whereby, in effect, % another subtree is started with one of the right-hand side symbols (of % the rule being applied) as its root. That subtree is returned at % Tree_i in the fourth argument of test_rule/4. The partial subtrees % from the other right-hand side symbols are returned from apply/4 at % Tree_j in its fourth argument. A list comprised of these two lists of % subtrees is then returned by apply/4 to the assertion which tests it. % % The first and second assertions of apply/4 both comprise ground cases % of the recursion. In the first assertion, completion of the iteration % over the right-hand side symbols of a production is tested. It % succeeds if the list of symbols has been exhausted, as indicated by % the null list in its first argument. No data tokens are consumed at % this point; hence, the second and third arguments are equal, with the % list of data tokens passed to apply/4 in its second argument being % returned unchanged in its third argument. Since there are no further % branches to the local tree, the null list is returned in the fourth % argument. % % The second assertion is true if the right-hand side of the production % being applied is a terminal symbol which matches the current head of % the list of data tokens. This token is, in effect, removed, and the % remaining tokens in the list are returned in the third argument. (Note % that this is the only place at which the "consumption" of data tokens % occurs.) The matched token is returned in the fourth argument as a % leaf of the tree. % apply( [], Data, Data, [] ) . apply( [Word], [Word | Data_i], Data_i, [[Word]] ) . apply( [Symbol | Symbols], Data, Data_j, [Tree_i | Tree_j] ) :- apply( Symbols, Data_i, Data_j, Tree_j ), test_rule( Symbol, Data, Data_i, Tree_i ) . %rule/2 % % rule( LHS RHS ) : LHS is the left-hand side symbol of a production % with right-hand side symbol(s) in the list RHS. rule( s, [np, vp] ) . rule( s, [np, s1] ) . rule( s, [s1, np] ) . rule( s1, [vp] ) . rule( s, [vp] ) . % To change the "reading" of the sentence, on might change the % order of the following two rules so that rule( np, [n] ) precedes % rule( np, [a, n] ) , rather than the other way around, so that we have rule( np, [n] ) . rule( np, [a, n] ) . rule( vp, [v] ) . rule( vp, [v, np] ) . rule( vp, [v, s1] ) . rule( a, [buffalo] ) . rule( n, [buffalo] ) . rule( v, [buffalo] ) .