Chapter 2: Language and Grammar – Introduction to Automata Theory, Formal Languages and Computation

2

Language and Grammar

Introduction

To express ourselves to someone, i.e., to communicate with someone, we need some medium. That medium is language. Hindi, English, Bengali, etc., are all used by people to communicate among themselves. So, these are all languages.

For constructing a language, there are some rules. Without the rules, a language cannot exist. These rules are called the grammar for that language. Without grammar, a language cannot exist.

Formal language is an abstraction of the general characteristics of a programming language.

In computer science, to communicate with the computer hardware, the user needs some languages for programming purposes, such as C, C++, and Java. These are called programming languages. These languages are used for communicate between the computer and the user. So, they can be called as languages. For constructing the languages, there are some rules to be followed. The rules are called the grammar for that programming language.

2.1  Grammar

Grammar is a set of rules to define a valid sentence in any language. A language is not complete without grammar.

Grammar consists of four touples:

 

G = {VN, Σ, P, S},

 

where

(Non-terminal symbols are those symbols which can be replaced multiple times. Terminal symbols are those symbols which cannot be replaced further.)

Let us take a sentence in English:

 

Bikash goes to college.

 

This sentence can be represented in grammatical form like this:

Here,

A language is generated from the rules of a grammar. A language contains only terminal symbols ε ∑. Let a language L be generated from a grammar G. The language is written as L(G) and read as the language generated by the grammar G. The grammar is called G(L) and read as the grammar for the language L.

The production rules of a grammar consist of two parts: left hand side (LHS) and right hand side (RHS). The LHS of a production rule mainly contains the non-terminal symbols to be replaced. It may contain terminal and non-terminal both for some cases but at least one non-terminal. The RHS of a production rule may contain terminal, non-terminal, or any combination of terminal and non-terminal even null. To be a valid grammar, at least one production must contain the start symbol S at its LHS.

In most of the cases, a grammar is represented by only the production rules because the production rules contain a set of non-terminals, a set of terminals, and obviously the production rules and the start symbol. Sometimes in production rules, two or more rules are grouped into one. For those cases, the LHS of the production rules must be same. The RHS rules are grouped separated by ‘/’ under the same LHS. As an example,

are grouped as A → aA/bCa/ab.

Example 2.1

Construct the language generated from the given grammar:

 

S → aS/ε

Solution: The grammar consists of one non-terminal symbol S and one terminal symbol a. (ε is a null alphabet and not treated as terminal.) The start symbol S generates two strings ε and aS. Between these, ε belongs to the language set generated by the grammar as it does not contain any non-terminal symbol.

S in aS is a non-terminal, and so it can be replaced by any of ε or aS as there is a production rule S aS/ε. This replacing generates two strings, namely aaS and a (a ε means a). ‘a’ is only a terminal symbol, and so it also belongs to the language set generated by the grammar. By this process, aa, aaa, aaaa…. also belong to the language set. Let this iteration continue for n steps. Up to (n – 1) step, the string was a(n–1)S. In the nth step, S is replaced by ε producing an.

In the general form, the language generated by the grammar is an, where n ≥ 0. (If n = 0, the null string is produced.)

 

L(G) = an, n ≥ 0

 

The process is described in Fig. 2.1.

Fig. 2.1

Note: an means n number of ‘a’. (ab)n means n number of ‘ab’. anbn means n number of ‘a’ followed by n number of ‘b’. (ab)n and anbn are not the same because we are dealing here with strings.

Example 2.2

Construct the language generated from the given grammar:

 

S → aSb/ε

Solution: The grammar consists of two terminal symbols a and b. Here S → aSb and S → ε are combined into a single production. S is the start symbol. S is also a non-terminal symbol. So, S can be replaced. S can be replaced by two types of strings, aSb and ε. In aSb, there is a non-terminal symbol S. So, aSb is not in the language set produced from the grammar. ε will be in the language set.

In aSb, there is a non-terminal S. So, S can be replaced by either ε or aSb. If S is replaced by ε, it will be ab, which consists of only terminals. So, ab belongs to the language set.

By this process, the strings aabb, aaabbb will be in the language set. After the nth iteration, the language will be anbn, where the value of n will be 0,1,2,….n.

[n = 0 means a0b0 means there is no a and no b in the string, which means a null string (Λ).]

So, the language generated from the grammar is

 

L(G) = anbn n ≥ 0.

 

This is described in Fig. 2.2.

Fig. 2.2

(If the grammar was S → aSb/ab, the language will be L(G) = anbn, n > 0.)

Example 2.3

Construct the language generated by the grammar:

Solution: In the grammar, there are two terminal symbols ‘a’ and ‘b’. From the start symbol S, the control reaches to C and, then at the last step, C is replaced by b to form the language. S is replaced only one time and C can be replaced (n – 1) times producing anC an. At the last step, C is replaced by ‘b’. The language generation process is shown in Fig. 2.3.

Fig. 2.3

In the language generated by the grammar, there will be at least two ‘a’ (the two a’s are coming from the first replacement of S). Null does not belong to the language set.

The language generated by the grammar L(G) = anban, n > 0.

Example 2.4

Construct the language generated by the grammar:

Solution: The start symbol S generates another two non-terminal symbols, A and B, separated by a terminal symbol b. A can be replaced n times whereas B can be replaced m times as they are independent on each other. The language generation process is described in Fig. 2.4.

Fig. 2.4

In the language set generated by the grammar, there will be at least one ‘a’ in both the sides of ‘b’. But it may or may not be possible that in both the sides of ‘b’ A and B are replaced in same number because they are different non-terminals.

So, the language generated by the grammar becomes

 

L(G) = anbam, m, n > 0

Example 2.5

Construct the language generated from the given grammar:

 

S → aS/bS/ε

Solution: In this production rule, there are three productions, S → aS, S → bS, and S → ε. In ε, there is no non-terminal, and so in the language set there is a null string (Λ). S can be replaced by aS, bS, or ε. If S is replaced by aS in aS, it will be aaS. If S is replaced by bS, it will be abS. If S is replaced by aS in bS, it will be baS. If S is replaced by bS in bS, it will be bbS. If S is replaced by ε in aS and bS it will be a and b, respectively, which will belong to the language set. By this process, we will get aa, bb, ab, ba, …, aba, bab....abaaba, …. etc.

In a single statement, we can represent the language set as ‘any combination of a, b including null’.

In mathematics, it is represented as L(G) = {a, b}*

(Any combination of a, b excluding null is represented by {a, b}+)

Example 2.6

Construct the language generated from the given grammar:

 

S → aSa/bSb/C (C is also a terminal symbol)

Solution: From the start symbol, C is generated which is a terminal symbol. So, C belongs to the language set. From S, two another production rules are generated, namely aSa and bSb. In aSa, there is a non-terminal S. So, S can be replaced by aSa or bSb or C. If S is replaced by these production rules, the strings will be aaSaa, abSba, and aCa, respectively. Similarly, for bSb, the strings will be baSab, bbSbb, and bCb. Among these, aCa and bCb will be in the language set generated from the grammar.

By this process, we will get aaCaa, abCba, baCab, bbCbb…abaCaba,…., ababbCbbaba and so on.

If we look into the string, we will see that before C there is any combination of ‘a’ and ‘b’ including ε (For the string C, there is no a and no b), and after C the string is the reverse of the previous string. If we take the string before C as W, the string after C will be WR.

The language generated from the grammar will be

 

L(G) = WCWR, where W ε (a, b)*

 

The process is described in Fig. 2.5.

Fig. 2.5

Example 2.7

Construct the language generated by the given grammar:

Solution: The start symbol S generates two strings; among them, abc is the string of terminals. So, abc is the smallest string that belongs to the language set generated by the grammar. Another production is SaSAc. The production’s RHS contains two non-terminal symbols S and A. S can be replaced (n – 1) times producing a(n–1)S (Ac)(n–1). At the last step, if S is replaced by abc, it will produce anbc(Ac)(n–1).

Here, we can easily apply cA → Ac (n–1) times. This generates anbA(n+1)cn.

In this phase, the production bAbb can easily be applied (n – 1) times producing anbncn. The minimum length string is abc. So, the language generated by the grammar is

 

L(G) = anbncn, n ≥ 1

 

This is described in Fig. 2.6.

Fig. 2.6

Example 2.8

Find the language generated by the given grammar:

Solution: The grammar consists of three terminal symbols, namely ‘a’, ‘b’, and ‘c’. The start symbol S is replaced by aS or A. If it is replaced by aS, then S can again be replaced by aS or A. If S is once replaced by A, then the control goes to the production of A. So, S → A is the window to switch control from S to A. After the nth iteration of S → aS, the string becomes anS.

In this stage, S is replaced by A with the two productions A → bB/b generating anbB or anb. The last one consists of only the terminal symbol, and so it belongs to the language set generated by the grammar. Here, it can be written as L(G) = anb, n ≥ 0. (If S is replaced by A in the first step and A is replaced by ‘b’, the least length string becomes b.)

There is one branch in the left which consists of a non-terminal B. B can be replaced by either cC or ε, which generates anbcC or anb. The last one is already taken. ‘C’ in the first one that can be replaced by ε producing anbc. It is a string which consists of only terminal symbols. So, it belongs to the language set generated by the grammar. It can be written as L(G) = anb, n ≥ 0. As G generates two strings, the language is written as

 

L(G) = {anb, n ≥ 0} ∪ {anbc, n ≥ 0}

Example 2.9

Construct the grammar for the language an, n > 0.

Solution: The language consists of any number of ‘a’. If n is replaced by 1, 2, 3, and so on, the strings are a, aa, aaa………a(n–1), an.

From the previous list, it is clear that in the language set there is at least one ‘a’.

At the time of constructing the rules, it must be kept in mind that the rules must generate all the strings generated by the language set. In this case, it is [a, aa, aaa………a(n–1), an]. The least length string is a. So, there must be a production S → a. Now, there is a chance to generate other strings such as aa, aaa, etc. If it is thought that a non-terminal is attached with S, which is replaced again and again and at last replaced by ‘a’ to generate the string, then the construction process will be easier. Take the non-terminal as S, and so S can be replaced multiple times. Hence, the grammar for the language is

 

S → aS/a

 

In the form of {VN, ∑, P, S}, the grammar is

 

VN : {S}, ∑ : {a}, P : {S → aS/a}, S : {S}

 

But, writing the production rules are sufficient in describing a grammar.

(Note: If the language is an, n ≥ 0, then the production rules are S → aS/ε.)

Example 2.10

Construct the grammar for the language (ab)n, n > 0.

Solution: The language consists of any number of ab (here we can think of ab as a single character).

The grammar for the language is

 

S → abS/ab

 

Example 2.11

Construct the grammar for the language anbn, n > 0.

Solution: From the string, we can say that

  1. The language consists of a and b.
  2. In the language, a will appear before b.
  3. The number of ‘a’ and of ‘b’ are the same in any string that belongs to the language set.
  4. In the language, there is no null string.

The value of n is greater than 0. The lowest value of n is 1 here. If n = 1, the lowest length string that is produced from the language is ab, which is produced from the start symbol S. So, in the grammar, there will be a production S → ab.

In the language set, there are strings such as aabb, aaabbb….., anbn.

Every time, one ‘a’ and one ‘b’ is added in the string for n = 1, 2, 3….. It can be thought like this—in the middle there is a non-terminal, replacing which it is adding one a at the left and one b at the right. And, at last replacing that non-terminal by ab, it is producing aabb, aaabbb …….. .

So, the another production is

 

S → aSb

 

So, the grammar becomes

 

S → aSb/ab,

 

where

 

VN : {S}, Σ : {a, b}, P : S → aSb/ab, S : {S}

Example 2.12

Construct a grammar for the language anbn+1, n > 0.

Solution: The number of ‘b’ is one more than the number of ‘a’. As n > 0, in the language set, there is at least one ‘a’ and at least two ‘b’(as the number of ‘b’ is one more than the number of ‘a’). The least string is abb. For anbn, the production rule was S → aSb/ab. The modified production S → aSb/abb generates the language.

The grammar is SaSb/abb.

This can be constructed by importing one extra non-terminal as described in the following.

For anbn, the production rule was S → aSb/ab. If one extra ‘b’ is added in the first step, then it will be prepared for generating anbn+1. But, here, the problem is in ‘S’ at the RHS. Each replacement of S at the RHS adds one extra ‘b’ to the language set; but the extra ‘b’ must be limited to the first replacement only. If, in the first production’s RHS, one extra non-terminal, say ‘A’, is fetched and the production for A is made A → aAb/ε, then there is no problem.

The grammar for the language is

The grammar can be written in another way

Example 2.13

Construct the grammar for the language anbn+1, n ≥ 0.

Solution: The number of ‘b’ is one more than the number of ‘a’. As n ≥ 0, the number of ‘a’ may be zero. If the number of ‘a’ is zero, then the number of ‘b’ is one. That means, in the language set, there exists at least one ‘b’.

The grammar for the language is

The grammar can be written in another way as

 

S → aSb/b

Example 2.14

Construct the grammar for the language an ci bn n, i ≥ 0.

Solution: From the string, we can say that

  1. The language consists of a, b, and c.
  2. In the language set, a comes before c, and c comes before b.
  3. The number of a and the number of b are the same; the number of c may be different.
  4. A null string may exist in the language set.

In the language set, the number of a and the number of b are same. That means, each time a and b come simultaneously. For anbn, we have seen that there is a production rule S → aSb. In the string, c appears in between a and b by using the production rule S→Sc/ε. If c directly comes from S, then c may come after b, if someone does like this

 

S→ Sc → aSbc → abc

 

But abc does not belong to the language set.

From S, c will not come directly. We have to introduce another non-terminal A which generates ci between a and b. If n = 0, the language set consists of only c. The production rule S → A opens the provision of generating c only.

The production rules are

Example 2.15

Construct the grammar for the language an ci bn n > 0, i ≥ 0.

Solution: In the language set, c may be null. But there will be at least one a and one b. The production S → aAb opens the possibility of replacing A by a string of c and fulfills the condition of a and b becoming not null.

The grammar is

Example 2.16

Construct the grammar for the language an ci bn n ≥ 0, i > 0.

Solution: In the language set, a and b may be null but there will be at least one c. The production A → Ac/c generates strings of c fulfilling the condition that the number of c is not null.

The grammar is

Example 2.17

Construct the grammar for the language an ci bn n > 0, i > 0

Solution: A string that belongs to the language generated by the grammar contains at least one a, b, and c.

The grammar for the language is

Example 2.18

Construct a grammar generating the language an bn ci, where n ≥ 1, i ≥ 0.

Solution: The language consists of two parts: anbn, where n ≥ 1, and ci, where i ≥ 0. From the start symbol S, we need to take two non-terminals A and B, where A generates anbn with n ≥ 1 and B generates ci with i ≥ 0.

The grammar for the language is

Example 2.19

Construct a grammar generating the language ai bn cn, where n ≥ 1, i ≥ 0.

Solution: The language consists of two parts: bncn, where n ≥ 1, and ai, where i ≥ 0. So, from the start symbol S, we need to take two non-terminals A and B, where A generates ai with i ≥ 0 and B generates bncn with n ≥ 1.

The grammar for the language is

Example 2.20

Construct the grammar for the language al bm cn, where one of l, m, or n = 1 and the remaining two are equal.

Solution: In the language set, at least one of ‘a’, ‘b’, or ‘c’ will exist in single. ‘a’ and ‘c’ are terminal elements. If in the language set there is single ‘a’, the production rule is S→ aS1 and S1 → bS1c/ε. If in the language set there is single ‘c’, the production rule is S→ S2c and S2 → aS2b/ε. ‘b’ is the middle element; for a single ‘b’ in the middle, the production rule is S → aS3c and S3 → aS3c/b.

Combining all these production rules, the grammar for the language is

Example 2.21

Construct the grammar for the language al bm cn, where l + m = n.

Solution: The total number of ‘a’ and ‘b’ are equal to the total number of ‘c’. If the number of ‘a’ or the number of ‘b’ is zero and the non-zero element is 1, then the number of ‘c’ is 1. If the number of ‘a’ and the number of ‘b’ are zero, then the number of ‘c’ is also zero.

So, there are productions S → ac/bc/ε.

If, in the language, there is no ‘a’, then the number of ‘b’ is equal to the number of ‘c’. The same thing is applicable for a language without ‘b’. The production rules are S → aSc/bSc. But in this case, there is a chance for ‘a’ to occur after ‘b’. The modified production is S → aSc/bS1c.

The total number of ‘a’ and ‘b’ are equal to the total number of ‘c’. This can be solved, if in each occurrence of ‘a’ or ‘b’, one ‘c’ is added to the language. The production rules for this case are S1 → bS1c/bc/ε.

Combining all the production rules, the grammar for the language is

Example 2.22

Construct the grammar for palindrome of binary numbers.

Solution: A string which when read from left to right and right to left gives the same result is called palindrome.

A palindrome can be of two types: odd palindrome and even palindrome.

Odd palindromes are those types of palindromes where the number of characters is odd.

Even palindromes are those types of palindrome where the number of characters is even.

A null string is also a palindrome. The palindrome will consist of ‘0’ and ‘1’.

A string will be a palindrome if its first and last characters are same.

The character in the ‘n’th position from the beginning will be the same as the character from the ‘n’th position from the last.

The palindrome can start with 0 or can start with 1. 0 can come after 0, or 0 can come after 1.

1 can come after 1, or 1 can come after 0.

From the previous discussion, the grammar for an even palindrome is

 

S → 0S0/1S1/ε (null string is also a palindrome).

 

The grammar for an odd palindrome is

 

S → 0S0/1S1/0/1

 

By combining the two grammars for even and odd palindromes, the final grammar will be

 

S → 0S0/1S1/0/1/ε

Example 2.23

Construct the grammar for the language L = {Set of all string over 0,1 containing twice as many ‘0’ than ‘1’}.

Solution: The language consists of ‘0’ and ‘1’. In the language set, the number of ‘0’ is twice that of the number of ‘1’. ‘0’ and ‘1’ can occur in any pattern it must be confirmed that number of ‘0’ = 2 × number of 1.

The least strings generated by the grammar are 001, 100, and 010. (considering null string is not in the language set).

The grammar for the language is

Example 2.24

Construct a grammar consisting of equal number of ‘0’ and ‘1’.

Solution: |0| = |1| in any string generated by the grammar. If the string is null, |0| = |1| = 0, which also fulfills the condition.

The grammar for the language is

 

S → S0S1S/S1S0S/ε

Example 2.25

Find the grammar for the language anbm where n ≥ 2, m ≥ 3.

Solution: Each string in the language set generated by the grammar has at least two a’s and three b’s. It is not always true that the number of ‘b’ is one more than the number of ‘a’, as m and n are different numbers. In the language set, all the ‘a’ will come before ‘b’, and so we need to introduce two different nonterminals A and B to generate a string of ‘a’ and a string of ‘b’, respectively, with the conditions fulfilled.

As n ≥ 2 and m ≥ 3, two ‘a’ and three ‘b’ are added in the production rule with the start symbol.

The grammar for the language is

Example 2.26

Find the grammar for the language 0m1n where m ≠ n, m, n ≥ 1.

Solution: In the language set, there is at least one ‘0’ and one ‘1’. The number of ‘0’ is not equal to the number of ‘1’, i.e., the number of ‘0’ may be more than the number of ‘1’ or vice versa.

By introducing a production S → 0S1, the condition m, n ≥ 1 is satisfied. For introducing more ‘0’ than ‘1’, a new non-terminal A is introduced. B is introduced for the reverse case.

The grammar for the language is

Alternatively, the grammar can be in the following form:

Example 2.27

Construct the grammar for the language 0m1n, where m ≥ 1 and n ≥ 1 and m < n.

Solution: In any string that belongs to the language set, the number of ‘0’ and the number of ‘1’ are greater than or equal to ‘1’, and the number of ‘0’ is always less than the number of ‘1’. The least value of m is 1 here. So, the number of 1 must be at least 2.

By introducing a production S → 0S1, the condition m, n ≥ 1 is satisfied. Introducing more ‘1’, S is replaced by another non-terminal A.

The grammar for the language is

Example 2.28

Construct the grammar for generating all positive integer numbers.

Solution: Positive integer numbers are from 0 to infinity and of any length. 0 is an integer but the starting character ‘0’ of an integer has no value. Single-digit integers are generated from the production S→ 0/1/2/3/4/5/6/7/8/9.

If the production is made as S → 1S/2S/3S/4S/5S/6S/7S/8S/9S, then it can generate any integer number.

Combining these, the production becomes 

Example 2.29

Construct a grammar which generates all even integers up to 998.

Solution: In even numbers, the right most number is one of ‘0’, ‘2’, ‘4’, ‘6’, ‘8’. The middle and left-most number is one of ‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’. We have to generate even numbers up to 998, i.e., 3 digits. So, we have to consider all even numbers of one and two digits also.

The production S → 0/2/4/6/8 generates one-digit even numbers.

For generating two-digit numbers, another non-terminal A is introduced making the productions S → AS and A → 0/1/2/3/4/5/6/7/8/9. But here the problem is ‘S’ in RHS, which can be replaced by AS again making the string of an infinite length. To check this, the production is modified to S → AS1, and S1 → 0/2/4/6/8.

The production rules are

Example 2.30

Construct the grammar for generating real numbers greater than 0.

Solution: Real numbers greater than 0 consist of two parts: (i) integer part (ii) real part. Let us take two non-terminals A and B to generate a real number where A generates the integer part and B generates the real part. The construction of the integer number is already described.

After the decimal point, 0 has value if there is number after 0.

The production B → 0B/1B/2B/3B/4B/5B/6B/7B/8B/9B generates a real part of any length. At last, step B is replaced by any of 1, 2, 3, 4, 5, 6, 7, 8, or 9.

From the previous discussion, the grammar is

Example 2.31

Construct the grammar for WWR, where W ∈ (a, b)*.

Solution: The string W consists of a and b. W can be a null string also. WR is the reverse string of W. The number of characters in the string WWR will always be even, except the null string.

So, the string is nothing but an even palindrome of a and b including the null string.

The grammar is

 

S → aSa/bSb/ε

 

Example 2.32

Construct the grammar for the language L = anbncmdm m, n > 0.

Solution: The total string can be divided into two parts: (i) string of a and b, where the number of a and the number of b are the same and (ii) string of c and d, where the number of c and d is the same.

From the start symbol, there will be two non-terminals A and B, where A will generate anbn and B will generate cmdm. In the language set, there will be no null string.

So the grammar is

Example 2.33

Construct the grammar for the language L = ancmdmbn m, n > 0.

Solution: In between a and b, there is an equal number of c and d. The number of a and the number of b are the same.

So, we have to construct anbn with a non-terminal in between an and bn. From that non-terminal, cmdm will be produced.

The grammar constructed according to the previous discussion is

2.2  The Chomsky Hierarchy

The Chomsky hierarchy is an important contribution in the field of formal language and automata theory. In the mid-1950, Noam Chomsky, at the Harvard University, started working on formal languages and grammars. He structured a hierarchy of formal languages based on the properties of the grammars required to generate the languages. The Chomsky hierarchy drives automata theory one step forward.

Chomsky classified the grammar into four types depending on the production rules.

These are as follows:

Type 0: Type 0 grammar is a phase structure grammar without any restriction. All grammars are type 0 grammar.

For type 0 grammar, the production rules are in the format of

Type 1: Type 1 grammar is called context-sensitive grammar.

For type 1 grammar, all production rules are context-sensitive if all rules in P are of the form

 

αAβ → αγβ,

 

where A ε NT (i.e., A is a single non-terminal), α, β ε (NT U Σ)* (i.e., α and β are strings of non-terminals and terminals), and γ ε (NT U Σ)+ (i.e., γ is a non-empty string of non-terminals and terminals).

Type 2: Type 2 grammar is called context-free grammar. In the LHS of the production, there will no left or right context.

For type 2 grammar, all the production rules are in the format of

 

(NT) → α, where |NT| = 1 and α ε (NT U T)*
NT : Non-terminal T : Terminal

Type 3: Type 3 grammar is called regular grammar. Here, all the productions will be in the following forms:

 

A → α or A → αB, where A, B ε NT and α ε T

 

The Chomsky classification is called the Chomsky hierarchy. This is represented diagrammatically in Fig. 2.7

Fig. 2.7 The Chomsky Hierarchy

From this diagrammatical representation, we can say that all regular grammar is context-free grammar, all context-free grammar is context-sensitive grammar, and all context-sensitive grammar is unrestricted grammar.

The following table shows the different machine formats for different languages.

Grammar
Language
Machine Format
Type 0

Unrestricted language

Turing machine

Type 1

Context-sensitive language

Linear bounded automata

Type 2

Context-free language

Push down automata

Type 3

Regular expression

Finite automata

(Null alphabet is represented by ε. Null string is represented by Λ. Null set is represented by Ф.)

The following are some examples from context-sensitive grammar.

Example 2.34

Construct a grammar for the language anbncn, where n ≥ 1.

Solution: We construct it by the following process:

  1. First, we construct anαn.
  2. Then, from αn we construct bncn.

From α, generating bncn is difficult, From αn, (bc)n can be constructed. But (bc)n is not equal to bncn. Introduce a new non-terminal B.

Example 2.35

Construct a grammar for the language xx, where x ∈ (a, b)*

Solution:

Example 2.36

Construct a grammar for the language {anbncndn, n ≥ 1}.

Solution:

In the Chomsky classification, the different types of grammar and their production rules are described. From the production rules, it is possible to find the highest type of the grammar. The following are some examples of finding the highest type of grammar from the production rules.

Example 2.37

Find the highest type of the following grammar.

Solution: The grammar is context-free as the LHS of each production contains only one non-terminal.

The RHS of each production contains a single terminal or a single terminal followed by non-terminal. This matches with regular grammar. So, the grammar is type 3 grammar.

Example 2.38

Find the highest type of the following grammar.

Solution: The grammar is context-free as the LHS of each production contains only one non-terminal.

Now, it is needed to check whether it is regular or not. The RHS of all productions does not contain a single terminal or a single terminal followed by non-terminal. So, the grammar is not regular. Hence, it is type 2 grammar.

Example 2.39

Find the highest type of the following grammar.

Solution: The LHS of the second production rule contains a terminal and a non-terminal (left context). So, the grammar is context-sensitive, and hence it is type 1.

What We Have Learned So Far

  1. The set of rules for constructing a language is called the grammar for that language.
  2. For every programming language such as C, C++, or Java, there is a grammar.
  3. Grammar consists of four touples. G = {VN, Σ, P, S}, where VN : set of non-terminals, Σ : set of terminals, P : set of production rules, S : start symbol.
  4. Chomsky, a linguist, classified the grammar into four types depending on the productions. The types are type 0, i.e., unrestricted grammar, type 1, i.e., context-sensitive grammar, type 2, i.e., context-free grammar, type 3, i.e., regular grammar.
  5. For each of the grammar, there is a specific language namely unrestricted language, context-sensitive language, context-free language, and regular expression.
  6. Unrestricted language is accepted by the Turing machine, context-sensitive language is accepted by linear bounded automata, context-free language is accepted by push down automata, and regular language is accepted by finite automata.
  7. For type 0 grammar, the production rules are in the format of {(Lc)(NT)(Rc)} → {String of terminals or non-terminals or both}, where Lc : Left context, Rc : Right context, and NT: Non-terminal
  8. For type 1 grammar, all production rules are context-sensitive if all rules in P are of the form αAβ → αγβ, where A ∈ NT (i.e., A is a single non-terminal), α, β ∈ (NT U Σ)+ (i.e., α and β are strings of non-terminals and terminals), and γ ∈ (NT U Σ)+ (i.e., γ is a non-empty string of nonterminals and terminals).
  9. Type 2 grammar is called context-free grammar. In the LHS of the production, there will be no left or right context. For type 2 grammar, all the production rules are in the format of (NT) → α, where |NT| = 1 and α ∈ (NT U T)*, NT : Non terminal, and T: Terminal.
  10. Type 3 grammar is called regular grammar. Here, all the productions will be in the following forms: A → α or A → αB, where A, B ∈ NT and α ∈ T.

Solved Problems

  1. Find the languages generated by the following grammars.
    1. S → aSa/aba    
    2. S → aSb/aAb, A→ bAa/ba
    3. S → 0S1/0A1, A → 1A/1    
    4. S → 0A/0/1B, A → 1A/1, B → 0B/1S

    Solution: 

    1. S is replaced by aSa or aba. aba is a string of terminals. S in aSa can again be replaced by aSa or aba. By this process, the language is 
    2. S can be replaced by aSa or aAb. S→ aAb shifts the control from S to A. Replacing S (n–1) times produces an–1Sbb–1. Replacing S by aAb produces anAbn.

      A is replaced by bAa or ba. As S and A are separate non-terminals, there is no guarantee of replacing S and A at the same time. Let A be replaced (m–1) times, and then finally replaced by ba. This generates the string anbmambn

       The language generated by the grammar is L(G) = anbmambn, where m, n > 0.

    3. S → 0S1 → 00S11 …→ 0m–1A1m–1 → 0mA1m → 0m1A1m…→ 0m1n–1A1m → 0m1n1m → 0m1m+n

      The language generated by the grammar is L(G) = 0m1m+n where m, n > 0.

    4. First, let us try to find the expressions generated by B → 0B/1S and A → 1A/1. Then, replace the expressions generated by A and B into S → 0A/1B.
  2. Generate a CFG for the language L = Alternating sequence of ‘a’ and ‘b’.

    Solution: The language may start with ‘a’ or start with ‘b’. If the string starts with ‘a’, then the language is (ab)n. If the string starts with ‘b’, then the language is (ba)n. If the string starts with ‘a’, then the production rule is S → aA. If the string starts with ‘b’, then the production rule is S → bB. For alternating sequences of ‘a’ and ‘b’, the production rules are A → bB, B → aA, A →b, B →a.

    The final grammar is

  3. Find the grammar for the language L = {anbm, where n + m is even }

    [UPTU 2003]

    Solution: This may happen in three cases if

    1. both ‘a’ and ‘b’ are odd
    2. both ‘a’ and ‘b’ are even
    3. any one of ‘a’ and ‘b’ are even and the other is zero.

    If the first two are constructed, then the third will be fulfilled. (0 can be considered as even.)

    The grammar is 

     It can be constructed another way by taking two productions from S; among them, one generates both ‘a’ and ‘b’ odd and another generates both ‘a’ and ‘b’ even.

  4. Find the grammar for the following language

    [UPTU 2004]

    L = anbnck, where k ≥ 3

    Solution: In the language, there is at least three ‘c’. The number of ‘a’ and number of ‘b’ are the same and may be null.

    The grammar is

  5. Construct a grammar which generates all odd integers up to 999.

    Solution: The odd numbers from 0 to 9 are 1, 3, 5, 7, 9. The grammar for generating 1, 3, 5, 7, 9 are S → 1/3/5/7/9.

    For an odd number of length 2, the grammar is S → AS1, where A → 0/1……..7/8/9, S1→ 1/3/5/7/9.

    For an odd number of length 3, the grammar is S → BAS1, where A → 0/1……..7/8/9, B → 0/1……..7/8/9 S1→ 1/3/5/7/9.

    Combining all these rules, the grammar becomes

  6. Construct a grammar which generates {(01)n ∪ (10)n}, where n > 0

    Solution: It consists of two languages (01)n and (10)n. Both are connected by union. Take A and B as two non-terminals, which generate (01)n and (10)n, respectively. From the start symbol S it goes to A and B. The grammar is

  7. Find the grammar for the language L = {anbm, where n ≠ m}.

    Solution: Here, two cases are possible (i) The number of ‘a’ is more than the number of ‘b’ (ii) The number of ‘b’ is more than the number of ‘a’.

    For case (i) the grammar is

    For case (ii) the grammar is

    The complete grammar is

  8. Find a grammar generating L = {anbncf | n ≥ 1, f ≥ 0 }.

    [WBUT 2010(IT)]

    Solution: The language consists of two parts: anbn, where n ≥ 1, and cf, where f ≥ 0. So, from the start symbol S, we need to take two non-terminals A and B, where A generates anbn with n ≥ 1 and B generates cf with f ≥ 0.

    The grammar for the language is

  9. Construct a grammar for the language

     

    L = (0 + 1)*111(0 + 1)*

     

    Solution: It can be thought of as a language set having any combination of 0 or 1 in both the sides of 111. We know that the grammar for (0 + 1)* is A → 0A/1A/0/1.

    From the previous discussion, the grammar is

  10. Find the grammar for L = {anbmcm | n ≥ 0, m ≥ 1 }.

    [WBUT 2003]

    Solution: There are two parts of the language, an and bmcm. Take two non-terminals A and B to generate these two parts.

  11. Find the grammar for the language L = {anbm | n + m is even}.

    [UPTU 2004]

    Solution: There are two conditions for n + m to become even.

    1. both n and m are even
    2. both n and n are odd

    For case (a), the language is (aa)*(bb)*. For case (b), the language is a(aa)*(bb)*b.

    For case (a), the grammar is

     

    S1 → aaS1bb/ε

     

    For case (b), the grammar is

    Thus, the final grammar is

  12. Construct a grammar for the language (i) ai b2 i for i > 0 and (ii) anban for n > 0. [Anna University 06]

    Solution:

    1. For a single ‘a’, two ‘b’s are added. The grammar is
    2. In the language, there is only one ‘b’. The grammar is

Multiple Choice Questions

  1. Which is correct
    1. a+ = a*. a*    
    2. a* = a+. a+
    3. a+ = a*. a    
    4. a* = a+. a*
  2. (a, b)* means
    1. Any combination of a, b including null
    2. Any combination of a, b excluding null
    3. Any combination of a, b, but ‘a’ will come first
    4. None of these
  3. (a, b)+ means
    1. Any combination of a, b including null
    2. Any combination of a, b excluding null
    3. Any combination of a, b, but ‘a’ will come first
    4. None of these
  4. What is the language generated by the grammar S → aSb, S → A, A → aA
    1. ambm
    2. anbm
    3. ambm
  5. Which type of grammar is the following in particular S → aSb S →ab
    1. Unrestricted
    2. Context-sensitive grammar
    3. Context-free grammar
    4. Regular grammar
  6. Which type of grammar is the following in particular S → aS/bA A → aA/a
    1. Unrestricted
    2. Context-sensitive grammar
    3. Context-free grammar
    4. Regular grammar
  7. The language genarated by the grammar S → 0S0, S → 1S1, S → 0, S → 1 is
    1. Even palindrome of 0 and 1
    2. Odd palindrome of 0 and 1
    3. Any combination of 0 and 1
    4. None of these
  8. What is the highest type number to the grammar given by the following production rules S → Aa, A → c|Ba, B → abc.  

    [WBUT 2008]

    1. Zero    
    2. One    
    3. Two    
    4. Three
  9. The language {ambncm + n | m, n ≥1} is
    1. Regular
    2. Context-free but not regular
    3. Context-sensitive but not context-free
    4. Type 0 but not context-sensitive
  10. Which type of grammar is the following in particular A → aB, B → ac, C → bC/aD, D → bA/b
    1. Context-sensitive grammar
    2. Context-free grammar
    3. Context-free but regular in particular
    4. Not type0 but context-free
  11. The machine format of context-sensitive grammar is   

    [WBUT 2009]

    1. Finite automata
    2. Push down automata
    3. Linear bounded automata
    4. All of the above
  12. Which of the following grammar generates strings with any number of 1’s?

    [WBUT 2010]

    1. S→ 1A, A → ε
    2. S → 1S, S→ ε
    3. S → S1, S→ ε
    4. (b) & (c)
  13. Which of the following is true?
    1. (ab)2 = a2b2    
    2. (ab)2 = abab
    3. (ab)2 = aabb    
    4. None of these
  14. If ∑ = {1}, then ∑* – ∑+ is
    1. 1+    
    2. {1}    
    3. { ^}    
    4. {^, 1, 11….. }
  15. A language set contains
    1. Alphabet    
    2. String
    3. Set    
    4. None of these
  16. In automata, why it is called formal language?
    1. Some alphabet forms a string
    2. Some strings form a language
    3. Well-defined use of symbols is significant
    4. Only the form of the string generated by alphabets is significant.
  17. The language constructed from the grammar S → aSbb/aAbb, A →a is
    1. an+1 b2n    
    2. anb2n
    3. anb2n–1    
    4. an+1 b2n+1
  18. The language generated by the grammar S →aSa/aBa, A→ Ba/b is
    1. anban    
    2. anban+1    
    3. a2n    
    4. Ф

Answers:

  1. c
  2. a
  3. b
  4. b
  5. c
  6. d
  7. b
  8. c
  9. b
  10. c
  11. c
  12. d
  13. b
  14. c
  15. b
  16. d
  17. a
  18. d

GATE Questions

  1. In the given context-free grammar, S is the start symbol, a and b are terminals, and ε denotes the empty string.

     

    S → aSa/bSb/a/b/ε

     

    Which of the following strings is NOT generated by the grammar?

    1. aaaa
    2. baba
    3. abba
    4. babaaabab
  2. In the given context-free grammar, S is the start symbol, a and b are terminals, and ε denotes the empty string.

    The grammar generates the language

    1. ((a + b)*b)*    
    2. {ambn |m ≤ n }    
    3. {ambn | m = n }    
    4. a*b*
  3. The two grammars given generate a language over the alphabet (x, y, z)

    Which of the following choices describe the properties satisfied by the strings in these languages?

    1. G1: No y appears before any x

      G2: Every x is followed by at least one y

    2. G1: No y appears before any x

      G2: No x appears before any y

    3. G1: No y appears after any x

      G2: Every x is followed by at least one y.

    4. G1: No y appears after any x

      G2: Every y is followed by at least one x.

  4. Consider the given grammar

    Consider the following strings. Which are accepted by the grammar?

    i) xxyyx    ii) xxyyxy    iii) xyxy    iv) yxxy    v) yxx    vi) xyx

    1. (i), (ii) and (iii)    
    2. (ii), (v) and (vi)
    3. (ii), (iii) and (iv)    
    4. (i), (iii) and (iv)
  5. Consider the following grammars. Names representing terminals have been specified in capital letters.

    Which of the following statements is true?

    1. G1 is context-free but not regular and G2 is regular
    2. G2 is context-free but not regular and G1 is regular
    3. Both G1 and G2 are regular
    4. Both G1 and G2 are context-free but neither of them is regular.
  6. Consider a grammar with the following productions
    1. Context-free    
    2. Regular    
    3. Context-sensitive    
    4. LR(k)
  7. Which of the following definitions generate the same language as L, where L = {xnyn such that n ≥ 1}?
    1. E → xEy/xy
    2. xy/(x+ xyy+)
    3. x+ y+
    1. I only    
    2. I and II    
    3. II and III    
    4. II only
  8. Consider the grammar

    The string generated by the grammar is bicjdkem for some i, j, k, m ≥ 0. What is the relation among i, j, k, and m?

    1. i = j = k = m    
    2. i + k = j + m    
    3. i = j but k≠m    
    4. i + j + k = m
  9. Consider the grammar 

     The grammar generates the strings of the form aibj for some i, j ≥ 0. What are the conditions of the values of i and j?

    1. i = j    
    2. j ≤ 2i    
    3. j ≥ 2i    
    4. i ≤ j
  10. The language {ambncm+n | m, n ≥ 1} is
    1. regular    
    2. context-free but not regular
    3. context-sensitive but not context-free    
    4. type-0 but not context-sensitive
  11. Consider the following grammar G:

    Let Na(w) and Nb(w) denote the number of a’s and b’s in a string w, respectively. The language L(G) {a, b}+ generated by G is

    1. {w | Na(w) > 3Nb(w)}    
    2. {w | Nb(w) > 3Na(w)}
    3. {w | Na(w) = 3k, k ε {0, 1, 2,….}}    
    4. {w | Nb(w) = 3k, k ε {0, 1, 2,….}}
  12. Let L1 = {0n+m1n0m | n, m ≥ 0 }, L2 = {0n+m1n+m0m | n, m ≥ 0}, and L3 = {0n+m1n+m0n+m | n, m ≥ 0}. Which of these languages are NOT context-free?
    1. L1 only    
    2. L3 only    
    3. L1 and L2    
    4. L2 and L3
  13. Which one of the following grammars generate the language L = {aibj | i ≠ j}?
    1. S → aS | Sb | a | b
  14. In the correct grammar above, what is the length of the derivation (number of steps starting from S) to generate the string albm with l ≠ m?
    1. max(l, m) + 2    
    2. l + m + 2    
    3. l + m + 3    
    4. max(l, m) + 3
  15. S → aSa | bSb | a | b

    The language generated by the above grammar over the alphabet {a, b) is the set of

    1. all palindromes.    
    2. all odd length palindromes.
    3. strings that begin and end with the same symbol.    
    4. all even length palindromes.
  16. Consider the languages L1 = {0i1j | i ≠ j}, L2 = {0i1j | i = j}, L3 = {0i1j | i = 2j + 1}, and L4 = {0i1j | i ≠ 2j}. Which one of the following statements is true?
    1. Only L2 is context-free    
    2. Only L2 and L3 are context-free
    3. Only L1 and L2 are context-free    
    4. All are context-free

Answers:

  1. b
  2. b
  3. a
  4. c
  5. b
  6. c
  7. a
  8. b
  9. d
  10. b
  11. c
  12. d
  13. d
  14. a
  15. b
  16. d

Hints:

 

2.  The grammar is derived as S → aSAb → aaSAbAb - - - → amS(Ab)m. If S and A are replaced by ε, them m = n. If A is replaced by bA, then m ≤ n.

3.  S → xB → xy/xyS

4.  S → xB → xxBB → xxySB → xxyyAB → xxyyxy

S → xB → xyS → xyxB → xyxy

S → yA → yxS → yxxB → yxxy

5.  G2 is not regular for the productions

expr → expr + expr

expr → expr * expr

8.

9.  Same as 2

11.  It accepts b (k = 0) aaa, b+ ab+ ab+ a (k = 1)

14.  S → AC → AaCb → AaaCbb → Aaabb → aaabb a = 3 b = 2 max(3, 2) + 2 = 5 = Number of derivations

16.  L1 is CFG according to 14.

The grammar of L3 is

S → 00S1/0

Fill in the Blanks

  1. Grammar consists of four touples—Set of non-terminals, _________________, set of production rule, and _____________.
  2. According to the Chomsky hierarchy, there are ___________ types of grammars.
  3. Type 1 grammar is called ________________.
  4. Type 2 grammar is called ________________.
  5. According to the Chomsky hierarchy, regular grammar is type __________ grammar.
  6. All languages are accepted by _____________.
  7. The machine format of context-free language is ____________.
  8. Linear bounded automata is the machine format of _____________.
  9. The machine format of type 3 language is _____________.
  10. Grammar where production rules are in the format αAβ → αγβ is __________ grammar
  11. In a context-free grammar at the left hand side, there is ____________ non-terminal.
  12. Type 3 language is called _____________.
  13. anbncn is an example of _____________ language in particular.
  14. The grammar S → aSb/A, A →Ac/c is an example of __________ grammar in particular.
  15. The grammar S → Abc/ABSc, BA → AB, Bb → bb, A → a is an example of _______ grammar in particular.
  16. The grammar A → aA/bB/a/b, B → bB/b is an example of __________ grammar in particular.
  17. The language a*(a + b)b* is an example of ____________ language in particular.
  18. L = Alternating sequence of ‘a’ and ‘b’ is an example of ____________ language in particular.
  19. Regular expression is accepted by type __________ grammar.

Answers:

  1. set of terminals, start symbol
  2. Four
  3. Context-sensitive grammar
  4. Context-free grammar
  5. Three
  6. Turing machine
  7. Push down automata
  8. Context-sensitive grammar
  9. Finite automata
  10. Context-sensitive
  11. Single
  12. Regular expression
  13. Context-sensitive
  14. Context-free
  15. Context-sensitive
  16. Regular grammar
  17. Type 4
  18. Context-free
  19. Three

Exercise

  1. Define grammar. Is any grammar possible without a start symbol?
  2. Is it possible to guess the type of language from a given grammar? Is the reverse true? Give reasons in support of your answer.
  3. What is the Chomsky hierarchy? What is the usefulness of it?
  4. Find the languages generated by the following grammars.
    1. S → aSb/A, A →Ac/c
    2. S → aSb/aAb, A → Ac/ε
    3. S → aSb/aAb, A → bA/b
    4. S → S1/S2, S1 → 0S11/0A, A → 0A/, S2 → 0S21/B1, B → B1/ε
    5. S → AB/CD, A → aA/a, B → bB/bC, C→ cD/d, D → aD/AD

      Justify your answer for this.

    6. S → AA, A → BS, A → b, B → SA, B→ a
    7. E → E + E| E–E| E * E| E/E|id
  5. Construct a grammar for the following languages.
    1. L = {Ф}
    2. L = (a, b)*, where all ‘a’ appears before ‘b’
    3. L = (a, b)*, where all ‘b’ appears before ‘a’
    4. L = (a, b)*, where there are equal number of ‘a’ and ‘b’
    5. L = (a, b)*, where ab and ba appear in an alternating sequence.
    6. L = (a, b)*, where the number of ‘b’ is one more than the number of ‘a’
    7. L = (a, b)*aa(a, b)*
  6. Construct a grammar for the following languages.
    1. L = ambn, where m ≠ n.
    2. L = axbycz, where y = x + z
    3. L = axbycz, where z = x + y
    4. L = axbycz, where x = y + z
    5. L = Set of all string over a, b containing aa or bb as substring
    6. L = Set of all string over a, b containing at least two ‘a’
    7. L = Set of all string over 0, 1 containing 011 as substring
  7. Construct a grammar for the following languages.
    1. { anbn| n > 0} ∪ {cmdm | m ≥ 0 }
    2. { anbn| n > 0} ∪ {ambm | m ≥ 0 }
    3. { axbycz, where x = y + z } ∪ { L = axbycz, where z = x + y}
  8. Construct a grammar for the following languages and find the type of the grammar in particular.
    1. L = (0 + 1)* 11 (11)*
    2. L = (Set of all string of ‘a’, ‘b’ beginning and ending with ‘a’)
    3. L = a2n + 1, where n > 0