% Linguistics 484 -- Example: FSA Prolog % % A Finite-State Automaton (FSA) % % % This example consists of predicates that cause the Prolog interpreter to % emulate the working of a FINITE-STATE AUTOMATON. % % A Finite-State Automaton (FSA) is equivalent to a Type 3 grammar, that is % given a Type 3 (regular, linear, finite) grammar G, an FSA can be devised % that recognises, or accepts, just the sentences generated by the grammar G. % % An FSA is an abstract machine, usually conceived of as idealised, but % much simplified computer, that consists of the following three components: % % a CONTROL UNIT, % a READ UNIT, and % a INPUT TAPE. % % The input tape contains the information comprising the problem upon which the % automaton is to work. The read unit reads the data tape. The control unit % directs the functioning of the automaton. % % The control unit has a finite number of internal states -- and hence, the % description of this abstract machine as a "finite-state" automaton. Each of % these states is identified by a symbol from a vocabulary of the STATE LABELS % for the automaton. At the start of the procedure performed by the automaton, % the machine is in a special state called its STARTING STATE, that is labelled % with a particular symbol from the state label vocabulary. When the problem % is solved (if it can be solved), the last symbol from the data tape has been % read, and the automaton is in one of a collection of special FINAL STATES. % % The processing undertaken by a particular finite-state automaton in its % solving of a problem is determined by its TRANSITION FUNCTION. This function % specifies the state to which the automaton will move, given the state it is % in when a particular data symbol is read. Thus, the transition function % consists of a collection of ordered triples, % % % % wherein si is the label of the current state of the machine, % d is a symbol read from the machine's input tape, and % sj is the label of the state to which the machine will move. % % Each of these triples may also be written in the form % % si d --> sj % % to represent a transition to the state labelled sj when the machine % has read the data symbol d while in the state labelled si. % % When the automaton is functioning as a recogniser or acceptor, the data % symbols comprise the words of a putative sentence. Thus, the vocabulary of % data symbols corresponds to the terminal vocabulary of the language in which % the string of words on the data tape might be a sentence. The nonterminal % vocabulary of a grammar for this language, on the other hand, correspond to % the labels of the states of the automaton. The transition function of the % automaton then represents a grammar for the language, with the triples % comprising the transition function corresponding to the rewriting rules. % Thus, the transition si d --> sj identified above corresponds to % the rewriting rule % % si --> d sj % % % The input tape consists of a sequences of CELLS, each of which contains one % datum of the information describing or comprising a particular problem. % Each of these data is an element from a collection of INPUT SYMBOLS % comprising a vocabulary of such symbols known to the automaton. At the % start of the process of solving a problem, the tape is positioned at the % first (left-most) cell. As the read unit reads the data tape, it begins at % the first cell and scans the tape from left to right, one cell at a time, % in a sequence of discrete steps. % % In this emulation of a FSA, a list data structure is used to function as % the data tape. Each element of the list corresponds to one the cells % of the tape. The reading of the tape by the automaton's read unit % is then emulated by the procedures normally employed to examine the % elements of a list. In Prolog, the process of examining the elements of a % list is usually undertaken using a recursive predicate. Thus, the step by % step process performed by the read unit as it scans each cell of the tape % in turn corresponds to the re-entry into a recursive Prolog rule in the % course of which the (then current) head of a list is extracted. % % The functioning of the control unit of the FSA is emulated by other % predicates. The transition function emulated by a predicate consisting % of three-argument facts. The transitions undergone by the control unit then % can be emulated by the unification procedure performed by the Prolog % interpreter. In this case, the procedure can be described as follows: % % With the control unit in a particular state, and with a given word (data % symbol) having been extracted from the head of the list (read from the input % tape), a match of these two symbols is sought in one of the assertions of % transition function predicate. If the symbols match the first two arguments % of one of these assertions, then the third argument of the assertion will % yield the label of the state to which the automaton will move. This state % label, together with the next word (input symbol) from the list (input tape), % then can be used in another test of the transition function predicate to % obtain the label of the next state to which the automaton will move. % % The testing of the transition matrix predicate can be incorporated into the % recursive predicate employed to extract tokens (input symbols) from the list % (input tape). The process can be stopped in a ground case assertion the % arguments of which include the null list (corresponding to the condition % that the last data symbol has been read), and the label of a final state. % If this assertion succeeds, then the list of tokens (the symbols on the input % tape) comprises a sentence in the language described by the grammar % represented in the transition matrix of the automaton. % % %s/1 s(L) : L is a list of token accepted by the finite-state % automaton. % % The s/1 predicate provides the "input data" to the control unit % of the finite state automaton. It consists of one assertion, % that being a rule, the body of which consists of the goal % control/2. The control/2 predicate emulates the functioning of % the control unit of the automaton. It succeeds, and hence, s/1 % will succeed, if all the symbols (tokens) in the list L are % consumed (leaving the null list), and the automaton is in a % final state. Note that the second argument of control/2 is the % string used to label the starting state of the automaton. % s(Data_list) :- control(s, Data_list). % %control/2 control(S, L) : L is a list (of data symbols) and S is an atom % labelling the current state of the finite state % automaton. % % This predicate emulates the functioning of the control unit of a % finite state automaton. It effects the transitions from state % to state during a recognition (acceptance) process. Recursion is % is employed to effect this iterative process. Thus, control/2 % consists of two assertions -- a recursive rule and a ground case % fact. % % The ground case assertion succeeds, and stops the recursion, if % the (current) list of input symbols is null (empty), and the % control unit of the finite-state automaton is in the final state, % that labelled with the atom "h". The list of data symbols % will normally be empty only if all the symbols have been removed % (consumed) in effecting transitions among the states of the % automaton, beginning in the starting state "s". % % The recursive rule effects the transitions among the states of % the automaton. In the head of this rule, the variable Symbol is % instantiated with the head of the (current) list of input symbols. % The variable Rest is instantiated with the tail of this list. % The variable From is instantiated with the label of the current % state of the automaton. % % The first goal in the body of the recursive rule is the goal % move/3. The assertions comprising this predicate constitute the % transition function for the finite-state automaton. % Because its first argument contains the variable From, % the value passed to move/3, when it is tested, will be the value % with which From has been instantiated in the head of the rule % (the second argument of control/2). The second argument of % move/3 contains the variable Symbol, a value for which has been % instantiated in the head of the rule (the first argument of % control/3). Thus, move/3 is tested with the label of the current % state, and with the current data symbol. If these yield a match % with an assertion of move/3, its third argument, To, will be % instantiated with the label of the state to which the automaton % will move. Of course, if there is no match, move/3 will fail. % In this case, the list of input symbols will not be accepted by % the automaton. % % If move/3 succeeds, control/2 will be tested with the tail of % the current list (the current head symbol having been consumed) % and with the label of the state to which the automaton will move % (as a consequence of the move/3 transition). The actual % transition may be considered to occur when control/3 is % re-entered with this new state label in its second argument, and % of course, the remaining input symbols in the list in its first % argument. % control(h, []). control(From, [Symbol | Rest]) :- move(From, Symbol, To), control(To, Rest). % %move/3 move(X, T, Y) : There exits a transition from state X to state Y, % given the symbol T. % % This predicate comprises the transition function of a finite state % automaton; that is, each assertion corresponds to one of the % transitions (moves) of the automaton. % % In each assertion of move/3, the first and third arguments % contain atoms which label states in the control unit of the % automaton. The second argument contains an atom from the data % vocabulary of the automaton. If the automaton is in the state % labelled by the first argument, and the symbol in the second % argument is read, then the automaton will move (undergo a % transition ) to the state labelled by the third argument. Thus, % the assertion "move(si, d, sj)" corresponds to the transition % " si d --> sj ". % % The states of a finite-state automaton correspond to the % nonterminal vocabulary of a Chomsky Type 3 (finite, linear, % regular) grammar. The data vocabulary of a finite state % automaton corresponds to the terminal vocabulary of a Type 3 % grammar. The transitions (moves) of a finite state automaton % correspond to the rewrite rules of a Type 3 grammar. Thus, the % transition " si d --> sj ", for example, corresponds to the % rewrite rule " si --> d sj ". Consequently, the assertions % of move/3 correspond to the rewrite rules of a Type 3 grammar, % with the first argument of each assertion corresponding to the % left-hand side symbol, the second argument to the terminal symbol % on the right-hand side, and the third argument to the nonterminal % right-hand side symbol of a rewrite rule. % % In this automaton, the string "s" labels the initial state, and % the string "h" labels the final state. The state "s" % corresponds to the initial (root, axiom) symbol of the % corresponding Type 3 grammar. The state "h" represents a another % special symbol in the corresponding Type 3 grammar: it may be % viewed as standing for a null symbol in the nonterminal % vocabulary, as if the null string (a string with no characters) % were a member of the nonterminal vocabulary of the Type 3 grammar. % move(s, she, vps). move(s, he, vps). move(s, it, vps). move(s, they, vpp). move(s, dog, vps). move(s, dogs, vpp). move(s, cat, vps). move(s, cats, vpp). move(vps, likes, np ). move(vpp, like, np ). move(vps, plays, np ). move(vpp, play, np ). move(vps, plays, h ). move(vpp, play, h ). move(vps, runs, h ). move(vpp, run, h ). move(np, her, h ). move(np, him, h ). move(np, it, h ). move(np, them, h ). move(np, dog, h ). move(np, dogs, h ). move(np, cat, h ). move(np, cats, h ).