Ling484 Notes |
A. C. Brett acbrett@uvic.ca
Department of Linguistics University of Victoria Clearihue C139 |

- Phrase-Structure Grammars
- Productions
- Sentences
- Languages
- Equivalent Grammars
- Chomsky Hierarchy
- Hierarchy of Languages
- l-Productions
- l-Free Grammars
- Normal Form Grammars
- Chomsky Normal Form
- Self-Embedding Grammars
- Type 3 Grammars
- Type 3 Normal Forms
- Type 3 Grammar Parse Tree

**Phrase-Structure Grammars.**
A *phrase-structure grammar G* is
defined as an ordered quadruple of the form

where the entries are identified as follows:G= áV_{N},V_{T}, S,Pñ

*V*_{N}is a*nonterminal vocabulary*consisting of the lexical and syntactic category labels;*V*_{T}denotes a set of words, called the*terminal vocabulary*of*G;*- S is a special member of
*V*_{N}that, in addition to being the label of the sentence category, identifies the*starting symbol*of*G;*and *P*identifies the collection of rewriting rules, described as the*production set*of the grammar.

**Productions.**
Members of the production set *P* of *G* have the
general form

a ® bwhere a Î (

- At least one member of
*P*has the form S ® b with b Î (*V*È_{T}*V*)_{N}^{*}; - At least one of the symbols in the left-hand side string
a of each of the members of
*P*must be an element of*V*; and_{N} - There must be at least one member of
*P*the right-hand side of which consists only of terminal symbols, that is, b Î*V*.^{*}_{T}

**Sentences.**
Derivations are undertaken by beginning with the starting symbol S
of a grammar *G* and applying productions from *P.*
A derivation terminates when a sentential form is obtained that
consists solely of words from the vocabulary *V _{T}*
of terminal symbols.
A string s Î

S Þ^{*}s

**Languages.**
The *language generated by G,* denoted *L(G),* where
*L(G)* Í *V ^{*}_{T},*
is the set of all strings of terminal symbols derivable from S.
Thus,

L(G)= {s | s ÎV, S Þ^{*}_{T}^{*}s}

**Equivalent Grammars.**
Two phrase-structure grammars *G _{a}* and

L(G=_{a})L(G_{b})

**Chomsky Hierarchy.**
Phrase-structure grammars may be classified according to the
form of their productions.
The conventional classification scheme is known as the
*Chomsky Hierarchy.*
According to this scheme, there four types of phrase-structure
grammar.
The first of these, identified as the *Type 0* grammars,
consists of those grammars which have no restrictions on
their productions other than the conditions noted above.
The other three classes of grammar, identified as the
*Type 1* through *Type 3* grammars, are defined
by increasingly restrictive conditions on their productions,
with the *Type 3* grammars being those that are subject
to the most stringent conditions.
The grammars in the Chomsky hierarchy are also identified by
name, and these names together with the forms of their
productions are shown in the following table:

TypeNameProductions0 Unrestricted a ® b with a Î ( VÈ_{T}V)_{N}^{+}and b Î (VÈ_{T}V)_{N}^{*}1 Context-Sensitive a _{1}Aa_{2}® a_{1}b a_{2}with A ÎVand a_{N}_{1}, a_{2}Î (VÈ_{T}V)_{N}^{*}and b Î (VÈ_{T}V)_{N}^{+}2 Context-Free A ® b with A Î Vand b Î (_{N}VÈ_{T}V)_{N}^{*}3 Regular, Finite A ® bB or A ® b with A, B Î Vand b Î_{N}V^{*}_{T}

**Hierarchy of Languages.**
A language over a vocabulary *V _{T}* is said to be
of

Thus, for example, the context-free languages are included in the family of context-sensitive languages, and the context-free languages include the regular or finite languages. The inclusion is proper so that, for example, there are context-free languages that are not regular or finite.LÌ_{3}LÌ_{2}LÌ_{1}L_{0}

**l-Productions.**
A *l-Production* is a rewriting
rule of the form

A ® lwith its right-hand side being the null string l. Note that the condition on the right-hand sides of context-free rules that they be strings in (

S' ® lThe new symbol S' is added to the nonterminal vocabulary of the grammar and the starting symbol of the resulting grammar is replaced by S'. The production S' ® l is then applied only to derive the sentence consisting of the null string.

S' ® S

**l-Free Grammars.**
A language that does not include the null string as a sentence is
called *l-free* and a grammar that
generates a l-free language is said
to be l-free itself.
Note that the form of the context-sensitive productions presented
above means that they generate only l-free
languages.
A context-sensitive language can include the null string as a
sentence.
In such cases, the rule
S ® l is added
to the production set of the generating grammar; but, S cannot
appear in the right-hand side of any of the other productions.

**Normal Form Grammars.**
A *Normal Form* for a given type or class of grammar is a grammar
with specified additional conditions imposed upon its productions,
but which is equivalent to the grammars in the given type.
For example, the productions of a Type 2, or context-free grammar
have the general form

A ® b with A ÎThe right-hand sides of such rules may consist of strings that include a mixture of terminal and non-terminal symbols. Thus, the ruleVand b Î (_{N}VÈ_{T}V)_{N}^{*}

NP ® the Nwith NP, N Î

DET ® thewith the symbol DET being added to the nonterminal vocabulary

NP ® DET N

A ®with A, B Îa

B ® b

**Chomsky Normal Form.**
A l-free context-free grammar is said
to be in *Chomsky Normal Form* if each of its productions
has one of the following forms:

A ®with A, B, C, D Îa

B ® CD

If the productions of the original grammar are already in the
normal form described above, namely
A ® *a*
B ® b
with A, B Î *V _{N},*

NP ® DET A Nis eliminated and two rules such as the following are included in the production set of the normal form grammar:

NP ® A N

NP ® DET NP

NP ® Nwherein the right-hand side consists of a single nonterminal symbol must also be eliminated by, in these cases, treating collective nouns as comprising NPs and intransitive verbs as VPs.

VP ® V

In the foregoinging examples, the nonterminal symbols in the normal form grammar are already in the nonterminal vocabulary of the original grammar. In general, however, the nonterminal vocabulary of the Chomsky Normal Form grammar will include new symbols, not included in the nonterminal vocabulary of the original grammar.

**Self-Embedding Grammars.**
A grammar is said to be *self-embedding* if there is a
symbol A Î *V _{N}*
and a derivation such that

A ÞFor example, a context-free grammar with the rules^{+}bAg where b,g Î (VÈ_{T}V)_{N}^{+}

S ® NP VPis self-embedding because of the derivation

NP ® NP S

S Þ NP VP Þ NP S VPwhich might be included in the generation of a sentence such as

The clause'a cat the dog saw ran.'

**Type 3 Grammars.**
A context-free grammar that is not self-embedding generates a
Type 3 language.
Thus, it is the self-embedding property of some context-free
grammars which enables them to generate languages that are
not Type 3.
The rules of a Type 3 grammar, that is, a finite or regular
grammar, all have one or the other of the following two forms

A ® bB, orwith A, B Î

A ® b

A ® bAand could yield

A Type 3 grammar is a Type 2 grammar because the left-hand side of each Type 3 production consists of a single nonterminal symbol. Furthermore, for any non-self-embedding Type 2 grammar there is an equivalent Type 3 grammar; that is, provided it is not self-embedding, a context-free grammar can be converted to a Type 3 grammar.

**Type 3 Normal Forms.**
Every Type 3 language, that is, every finite or regular language,
can be generated by a grammar all the rules of which have one or
the other of the following forms:

A ® aB with A,B ÎIn a variant of this normal form Type 3 grammar, rules of the formVand a Î_{N}V_{T}

A ® l with A ÎV_{N}

A ® a where a Îcan be substituted in place of those with the form A ® l.V_{T}

A rule such as

S ® dogs like NPwith S,NP Î

S ® dogs VPpwith S,NP,VPp Î

VPp ® like NP

denotes the original Type 3 grammar, then the normal form Type 3 grammarG= áV_{N},V_{T}, S,Pñ

whereG'= áV'_{N},V_{T}, S,P'ñ

V'_{N}=V_{N}È {VPp}

P'=P- {S ® dogs like NP} È {S ® dogs VPp, VPp ® like NP}

**Type 3 Grammar Parse Tree.**
The accompanying figure shows the structure generated by a
Type 3 normal form grammar for the sentence 'dogs like them.'
Observe that, because all the rules of the grammar are either of
the form A ® l,
or of the form A ® aB, such as in this
example S ® dogs VPp and
VPp ® like NP, every node of the tree,
except for the lowest one labelled H here, has two branches.
The left-hand branch of the two connects the node to a word,
while the right-hand branch connects it to another node.
Thus, the parser tree is a strictly *right-branching*
structure, with the last branch ending in a node connected
to the null string.

Note that, according to the alternative Type 3 normal form, all the rules have one or the other of the two forms A ® a or A ® aB. Thus, application of the alternative normal form to this example, would see the rule H ® l would be eliminated, and the rule NP ® them H replaced by NP ® them. The upper part of the tree generated by this grammar will then be the same as that shown in the figure; but the lowest node of the new tree will be labelled NP, and its single branch will connect to the word 'them.'

Linguistics 484 | Notes Index | Top of Page |