Ling484 Notes |
A. C. Brett acbrett@uvic.ca
Department of Linguistics University of Victoria Clearihue C139 |

Finite-State Automata are important because of their *equivalence*
to the *Type 3* grammars of the
Chomsky Hierarchy, a fact
established by
Chomsky and Miller (1958).
The equivalence of a Finite-State Automaton to a Type 3 grammar
means that, given a Type 3 grammar *G,* a Finite-State Automaton
*A* can be devised that recognises just the sentences generated
by *G.*
In fact, the Type 3 grammars, which are also known as the regular or
right-linear grammars, are often identified as the "finite
grammars" because of their equivalence to finite automata.

Another variety of automaton, namely the
Push-Down Automaton,
is the subject of another note.
The Push-Down Automaton differs from the Finite-State machine only
in that it is equipped with a memory
device that functions as a *push-down store,* which is
often called a *stack.*
The Finite-State Automaton itself does not include a memory device.
Push-Down Automata are equivalent to *context-free* grammars,
also known as the Chomsky Type 2 grammars, which means that,
given a context-free grammar *G,* a Push-Down Automaton *A*
can be devised that recognises just the sentences
generated by *G.*

The Finite-State Automata and their equivalence to finite grammars are the subject of this note. The following topics are discussed:

- Input Tape
- Read Unit
- Control Unit
- Ordered Quintuple Specification
- Transition Function
- Language of the Automaton
- Reductions
- Equivalence to Finite Grammars
- Example
- Transition Network
- Accepting Paths
- Nondeterministic Automata

- a
*Control Unit,* - a
*Read Unit,*and - an
*Input Tape,*

**Input Tape.**The*input tape,*sometimes called the*data tape*of the automaton, consists of a sequence of*cells.*Each of the cells beginning from the left-hand end of the tape, which is considered as the start of the tape, contains one of the words of the string to be processed by the machine. The words themselves are usually identified as comprising*input symbols*or*data symbols,*with individual words treated as if they constituted single symbols. The identification of the medium containing the input string as a tape is an historical artefact which now serves only to convey the notion that the words of the input string are made available to the automaton one at a time in a sequence, from left to right, beginning with the first, or leftmost word.**Read Unit.**The*read unit*of the automaton reads words from the cells of the input tape, beginning with the first word in the leftmost cell of the input tape, and provides the individual words one at a time to the control unit.**Control Unit.**The*control unit*governs the operations of the automaton by performing a sequence of*transitions*between the*internal states*available to it. Beginning with what is called its*initial state,*the control unit executes a transition as each word is provided to it by the read unit, with these transitions being determined by the*transition function*of the automaton. If, immediately after the last word of the input string has been read, the control unit moves into an*accepting state,*then the automaton is said to*accept*or*recognise*the string of words. Otherwise, if the control unit is not in an accepting state after the last word is read, the automaton rejects the string.

**Ordered Quintuple Specification.**
A Finite-State Automaton *A* can be specified by the entries
in the following ordered quintuple:

whereinA= áQ, T, M, qñ_{0}, H

*Q*is a finite nonempty set of labels identifying the*states*of the machine;*T*is a finite vocabulary of*input*or*data symbols;**M*is the*transition function*of the machine which maps from*Q×T*into*Q;**q*Î_{0}*Q*identifies the*starting state*of the machine; and*H*Í*Q*is a set of*accepting states.*

**Transition Function.**
The *transition function* of the automaton, which is also called
its *transition matrix* or *transition table,*
specifies the state into which the machine
will move on the basis of its current state and the word
that is read.
Thus, the transition function *M* can be represented as a set
of ordered triples with the form

áwhereqñ_{i}, w, q_{j}

*q*Î_{i}*Q*is a label identifying the current state of the automaton;*w*Î*T*is the word that is read with the automaton in the*q*state; and_{i}*q*Î_{j}*Q*is the label of the new state into which the automaton shifts on reading the word*w*.

q®_{i}wq_{j}

**Language of the Automaton.**
A string consisting of the *n* input symbols
*w _{1}w_{2} ... w_{n}*
is accepted by the automaton

áwithqñ Î_{i-1}, w_{i}, q_{i}M

**Reductions.**
The processing of a string of symbols by a finite-state automaton
can be treated as something like the derivation of a sentence
by a phrase-structure grammar.
Rather than producing a sentence, however, the derivation process
performed by a finite-state automaton actually consumes or
*reduces* an input string.
This process, called a *reduction,* is begun by
concatenating an input string
*w _{1}w_{2} ... w_{n}*
Î

A reduction of this string is then performed by applying transitions
as if they were rewriting rules.
Thus, the first transition, that shifts the automaton out of its
starting state as it reads the first input symbol, has the form
*q _{0}w_{1}* ®

where Þ can be read as "reduces in one transition to." The reduction corresponding to the reading of theqÞ_{0}w_{1}w_{2}... w_{n}q_{1}w_{2}... w_{n}

and the reduction during the transitionqÞ_{i-1}w_{i}w_{i+1}... w_{n}q_{i}w_{i+1}... w_{n}

whereqÞ_{n-1}w_{n}q_{n}

where Þqs Þ_{0}^{*}q_{n}

L(A)= {s | s ÎT,^{*}qs Þ_{0}^{*}q,_{n}qÎ_{n}H}

**Equivalence to Finite Grammars.**
Finite-state automata are said to be *equivalent*
to finite (Chomsky Type 3, regular) grammars, which means that,
given an arbitrary finite grammar *G',* a finite-state automaton
*A* can be divised that accepts just the sentences generated
by *G'.*
In other words,

whereL(A) = L(G')

The construction of an equivalent finite-state automaton proceeds as follows. In the first place, recall that the rewriting rules of a regular or finite grammar have the general form

A ® aBwhere A and B denote nonterminals, while a and b represent strings of terminals symbols. Recall further that, given an arbitrary finite grammar

B ® b

A ® aBthat is, the right-hands sides of the rules of

B ® l

then a finite-state automatonG= áVS,_{N}, V_{T},Pñ

that is equivalent toA= áQ, T, M, qñ_{0}, H

*Q*=*V*,_{N}*T*=*V*,_{T}*q*= S,_{0}*H*= {BÎ*V*| B ® l Î_{N}*P*},*M*= {Aa ® B | A ® aB Î*P*}.

Consequently, if s is a sentence
generated by *G,* that is s
Î *L(G),* then for every rule
in *P* applied to generate s,
there is a transition in *M* that can be applied to
accept s.
Hence, s Î
*L(A)* so that *L(G)* Í
*L(A).*
Conversely, if s Î
*L(A),* a sequence of rules can be found in *P*
that matches the sequence of transitions in *M* that are
executed to accept s.
Hence, s Î
*L(G)* so that *L(A)* Í
*L(G).*
Consequently, we have that

L(A) = L(G)

**Example.**
Consider the following normal form finite grammar *G* with
nonterminal and terminal vocabularies

and with a production set consisting of the following rules:V= {S, VPs, VPp, NP, H}_{N}

V= {she, he, it, they, them, dogs, run, like, play, plays, likes, runs}_{T}

The state label set of the equivalent automaton is then just the nonterminal vocabulary

P= {S ® she VPs, S ® he VPs, S ® it VPs, S ® dogs VPp, S ® they VPp, VPs ® likes NP, VPs ® plays NP, VPs ® plays H, VPs ® runs H, VPp ® like NP, VPp ® play NP, VPp ® play H, VPp ® run H, NP ® it H, NP ® them H, H ® l }

The set of accepting states contains the single label H from the nonterminal vocabulary of the grammar, and the starting state of the equivalent automaton is labelled with the starting symbol of the grammar so thatQ= {S, VPs, VPp, NP, H}

T= {she, he, it, they, them, dogs, run, like, play, plays, likes, runs}

The transition function of the automaton then consists of the the following elements:H= {H}

q= S_{0}

M= {S she ® VPs, S he ® VPs, S it ® VPs, S dogs ® VPp, S they ® VPp, VPs likes ® NP, VPs plays ® NP, VPs plays ® H, VPs runs ® H, VPp like ® NP, VPp play ® NP, VPp play ® H, VPp run ® H, NP it ® H, NP them ® H }

**Transition Network.**
The transition funtion of a finite-state automaton specifies a
*transition network.*
Such networks can be represented graphically as is illustrated
by the diagram included here which corresponds to the
transition function shown in the foregoing example.

In transition network diagrams, the states of the machine are
represented with circles that are labelled with symbols from the set
*Q* of states.
The accepting states are usually distinguished from the
others by being depicted with double concentric circles.
The state circles in the diagram are called the *nodes*
of the network.
In the case of an automaton that is equivalent to a finite
grammar, as is the case for the transition network illustrated
here, the nodes are labelled with the nonterminal symbols
of the grammar.

Nodes are joined by arrows called *arcs* that correspond
to the transitions between the states represented by the nodes.
The arrowhead on the arc identifies the direction of the
transition to which the arc corresponds.
Arcs are labelled by the input symbols that are read during
the corresponding transition.
In a transition network diagram for an automaton that is
equivalent to a finite grammar, the arcs are labelled with
the words from the terminal vocabulary of the grammar.
As the accompanying diagram shows, arcs may have multiple
labels when two or more different words can be read during
a transition between the same pair of states.
For example, the arc between the starting state S and the
VPs node is labelled with the three words 'she,' 'he,' and
'it.'
Also, the same word can label more than one arc, as is
illustrated by the fact the both the transition between the
S and VPs nodes and between the NP and H nodes are labelled
with the word 'it.'

**Accepting Paths.**
As an automaton processes a string, it can be described as
traversing a *path* in its transition network.
Such paths can consist of a single transition between a
pair of states along an arc labelled by a word that is read.
Paths can also consist of two or more transitions.
A path that begins in the starting state and ends in an
accepting state, with all of the words of the input string
having been read, is called an *accepting path.*
Note, however, that the machine is actually not moving
along the arcs of the transition network diagram; it is
simply shifting between its internal states that are
represented by the nodes in the diagram.

**Nondeterministic Automata.**
A finite-state automaton is said to be *deterministic* if its
transition function is *single-valued,* that is, for each
current state and input symbol pair, there is a unique new state.
An automaton is said to be *nondeterministic* if its
transition function is *multi-valued,* that is, there are at
least two transitions with the same current state and input symbol
pair, but with different new states.
The finite-state machine for which the transition network is
illustrated here is a nondeterministic automaton because, with the
machine in the VPs state and with the input word 'plays,' the
following two transitions to the different states NP and H are
available:

VPs plays ® NPFor any nondeterministic automaton

VPs plays ® H

In practice, where preservation of the corresondence between
*V _{N}* and

Linguistics 484 | Notes Index | Top of Page |