# Chapter 5: Regular Expression – Introduction to Automata Theory, Formal Languages and Computation

## Regular Expression

#### Introduction

Regular expression, in short RE, is the language part of the type 3 grammar, which is called regular grammar. RE is accepted by the machine called finite automata (FA). RE was first proposed by an American mathematician Stephen Kleene as a notion of describing the algebra of a regular set. Later on, Ken Thompson used the RE in early computer text editor ‘QED’ and Unix editor ‘ed’. QED uses RE for searching text. RE became useful in the Unix text processing program ‘grep’ and modern programming languages such as ‘PERL’. A Canadian computer scientist Henry Spencer wrote ‘regex’, which is widely used as a software library for RE.

They are called ‘regular’ because they describe a language with very ‘regular’ properties. In the previous chapter, we have already discussed about FA. Now, in this chapter, we shall learn about RE and some important properties related to RE.

#### 5.1  Basics of Regular Expression

An RE can be defined as a language or string accepted by an FA.

We know that an FA consists of 5 touples {Q, Σ, δ, q0, F}. Among them, an RE is a string on Σ, i.e., it consists of only input alphabets.

In short, a regular expression is written as RE or regex or regexp.

#### 5.1.1  Some Formal Recursive Definitions of RE

1. Any terminals, i.e., the symbols that belong to Σ are RE. The null string(Λ) and the null set(Ø) are also RE.
2. If P and Q are two REs, then the union of the two REs denoted by P+Q is also an RE.
3. If P and Q are two REs, then their concatenation denoted by PQ is also an RE.
4. If P is an RE, then the iteration (repetition or closure) denoted by P* is also an RE.
5. If P is an RE, then (P) is an RE.
6. The expressions got by the repeated application of the rules from (1) to (5) over Σ are also REs.

#### 5.2  Basic Operations on Regular Expression

Three basic operations are union, concatenation, and closure.

• The operation of union on RE: If R1 and R2 are two REs over Σ, then L(R1 ∪ R2) is denoted by

L(R1 + R2).

L(R1 ∪ R2) is a string from R1 or a string from R2.

L(R1 + R2) = L(R1) ∪ L(R2).

• The operation of concatenation on RE: If R1 and R2 are two REs over Σ, then L(R1 ∩ R2) is denoted by L(R1R2).

L(R1 ∩ R2) is a string from R1 followed by a string from R2.

L(R1R2) = L(R1)L(R2)

• The operation of closure on RE: If R1 and R2 are two REs over Σ, then L(R*) is a string obtained by concatenating n elements for n ≥ 0.

#### 5.2.1  Kleene’s Closure

It is defined as a set of all strings including null (Λ) obtained from the alphabets of the set Σ. It is denoted as Σ*.

If Σ = {0}, then Σ* = {Λ, 0, 00, 000, ……..}

If Σ = {0, 1}, then Σ* = {Λ, 0, 1, 01, 00, 11, 010, 011, 100, …….}

#### 5.2.1.1  Difference Between Closure and Kleene’s Closure

• Closure is nothing but the iteration of 0 to ∞ times, but Kleene’s closure is the set including Λ.
• Closure is applied on RE, but Kleene’s closure is applied on Σ.
• If R = 01, then the closure on R denoted by R* are Λ, 01, 0101, 010101……etc., i.e., the iteration of the same string 01.
• If Σ = {0, 1}, then Kleene’s closure is denoted by Σ* = {Λ, 0, 1, 01, 00, 11, 010, 011, 100,…….}, i.e., the set of any combinations of 0 and 1 including Λ.

Precedence of Operator: Among the previously discussed operators, Kleene star has the highest priority. Then, concatenation comes into play and finally union has the lowest priority.

REs can be described in English language. Consider the following examples.

#### Example 5.1

Describe the following REs in English language.

1. 10*1
2. a*b*
3. (ab)*
4. (a* + b*) c*
5. (00)*(11)*1

Solution:

1. The strings that we get from the RE 10*1 are {11, 101, 1001, 10001…..}. From the set, we can conclude that the strings that are generated from the REs are beginning with 1 and ending with 1. In between 1 and 1, there are any numbers of 0.

So, we can describe in English like this:

Set of all strings beginning and ending with 1 with any number of 0 in between the two 1s.

2. The strings that we get from the RE a*b* are a* . b*, i.e.,

{Λ, a, aa, aaa, ……}.{Λ, b, bb, bbb, …….} = {Λ, a, b, aa, bb, ab, abb, aab, ……..}.

From the set, we can conclude that the strings that are generated from the RE contain any number of ‘a’ followed by any number of ‘b’.

So, we can describe in English like this:

Set of all strings of ‘a’ and ‘b’ containing any number of ‘a’ followed by any number of ‘b’.

3. The strings that we get from the RE (ab)* are {Λ, ab, abab, ababab, ………}.

In English, it is described as:

Set of all strings of a and b with equal number of a and b containing ‘ab’ as repetition.

4. (a* + b*) means any combination of a or any combination of b. c* means any combination of c, which is concatenated with (a* + b*).

In English, it can be described as:

Any combination of ‘a’ or any combination of ‘b’ followed by any combination of ‘c’.

5. The strings that we get from the RE (00)*(11)*1 are {1, 001, 111, 00111, ……}. In the string, there are always odd number of 1 and even number of 0.

In English, it can be described as:

Set of all strings of ‘0’ and ‘1’ with even number of 0 followed by odd number of 1.

Alternatively, the RE described in English language can be denoted using alphabet and symbols. The following examples describe this.

#### Example 5.2

Built regular expression of the following

1. An RE consists of any combination of a and b, beginning with a and ending with b.
2. A language of any combination of a and b containing abb as a substring.
3. RE of a and b containing at least 2 ‘a’s.
4. Write an RE for the language L = {anbm | where m + n is even}.
5. An RE of a and b, having exactly one ‘a’.

Solution:

1. The RE consists of any combination of a and b, i.e., (a + b)*. But the RE starts with a and ends with b. In between a and b, any combination of a and b occurs. So, the RE is

L = a(a + b)*b.

2. In the RE, abb occurs as a substring. The RE consists of any combination of a and b. Before the substring, abb may occur at the beginning, at the middle, or at the last. If abb occurs at the beginning, then after abb there is any combination of a and b. If abb occurs at the middle, then before and after abb there are any combination of a and b. If abb occurs at the last, then before abb there is any combination of a and b.

The RE is L = (a + b)*abb(a + b)*.

3. In the RE, there are at least two ‘a’s. The expression consists of a and b. Therefore, any number of ‘b’ can occur before the first ‘a’ and before the second ‘a’, i.e., after the first ‘a’ and after the second ‘a’.

So, the RE will be L = b*ab*ab*.

4. The RE consists of n number of ‘a’ followed by m number of ‘b’. But m + n is always even. This is possible if

(a) m and n both are even or (b) m and n both are odd.

If m and n are both even, then the expression is (aa)*(bb)*.

If m and n are both odd, then the expression is (aa)*a(bb)*b.

By combining these two, the final RE becomes

L = (aa)*(bb)* + (aa)*a(bb)*b.

5. In the RE, there is exactly one ‘a’. Before and after a, there is any combination of b.

Therefore, the RE is L = b*ab*.

#### 5.3  Identities of Regular Expression

An identity is a relation which is tautologically true. In mathematics, an equation which is true for every value of the variable is called an identity equation. As an example, (a + b)2 = a2 + 2ab + b2 is an identity equation. Based on these identities, some other problems can be proved. In the RE also, there are some identities which are true for every RE. In this section, we shall discuss those identities related to RE.

1. Ø + R = R + Ø = R
2. Ø R = R Ø = Ø

Note: In both the previous cases, Ø denotes a null set, which contains nothing.

3. ΛR = RΛ = R
4. Λ* = Λ & Ø* = Λ

Same for Ø *.

5. R + R = R
6. R*R* = R*
7. R*R = RR*
8. (R*)* = R*
9. Λ + RR* = Λ + R*R = R*
10. (PQ)*P = P(QP)*
11. (P + Q)* = (P*Q*)* = (P* + Q*)* (D’Morgan’s theorem)
12. (P + Q)R = PR + QR

#### Example 5.3

From the identities of RE, prove that

(1 + 100*) + (1 + 100*)(0 + 10*)(0 + 10*)* = 10*(0 + 10*)*.

Solution: LHS.

#### Example 5.4

From the identities of RE, prove that the following three are equivalent

1. (011((11)* + (01)*)*)*011
2. 011(((1 + 0)1)*011)*
3. 011(((11)*(01)*)*011)*

Solution: Let P = (11) and Q = (01).

We know (P + Q)* = (P*Q*)* = (P* + Q*)*.

Thus, ((11) + (01))* = ((11)*(01)*)* [in c] = ((11)* + (01)*)*[in a] = ((1 + 0)1)* [in b].

Consider 011 as R and (P + Q)* as S.

We know that (RS)*R = R(SR)*

Thus,

Similarly,

As (i) ≡ (ii) and (i) ≡ (iii), (i) ≡ (ii) ≡ (iii).

#### Example 5.5

From the identities of RE, prove that

10 + (1010)*[∧ + (1010)*] = 10 + (1010)*.

Solution:

#### 5.4  The Arden’s Theorem

The Arden’s theorem is used to construct the RE from a transitional system of FA.

Theorem 5.1: Statement: Let P and Q be two REs over Σ. If P does not contain Λ, then the equation R = Q + RP has a unique (one and only one) solution, R = QP*.

Proof: Now, point out the statements in the Arden’s theorem in general form.

• P and Q are two REs.
• P does not contain the Λ symbol.
• R = Q + RP has a solution, i.e., R = QP*.
• This solution is the one and only solution of the equation.

If R = QP* is a solution of the equation R = Q + RP, then by putting the value of R in the equation, we shall get the value ‘0’.

R = Q + RP
R − Q − RP = 0
LHS R – Q – RP

(Putting the value of R in the LHS, we get)

So, from here it is proved that R = QP* is a solution of the equation R = Q + RP.

Now, we have to prove that R = QP* is the one and only solution of the equation R = Q + RP.

As R = Q + RP, put the value of R again and again in the RHS of the equation.

After several steps, we shall get

R = Q(Λ + P + P2 + P3 + .... Pn) + RPn + 1.

Now let a string w belong to R. If w is a Λ string, then in which part will the Λ belong?

This string will belong to either the Q(Λ + P + P2 + P3 + .... Pn) part or the RPn + 1 part. But, according to point number (ii), P does not contain Λ, and so the string w does not belong to RPn + 1. So, it will obviously belong to Q(Λ + P + P2 + P3 + .... Pn) which is nothing but QP*. [(Λ + P + P2 + P3 + .... Pn) is any combination of P including Λ.]

As this string belongs only in one part, R and QP* represent the same set. That means R = QP* is the unique solution of the equation R = Q + RP.

#### 5.4.1  Process of Constructing Regular Expression from Finite Automata

There are some assumptions:

• In the transitional graph, there must be no Λ-move.
• In the FA, there is only one initial state.

Now, we have to construct equations for all the states. There are n number of equations if there are n-states.

For any FA, these equations are constructed in the following way.

For the beginning state, there is an arrow at the beginning coming from no state. So, a Λ is added with the equation of the beginning state.

Then, these equations have to be solved by the identities of RE. The expression obtained for the final state and consists of only the input symbol (Σ) is the Regular Expression for the Finite Automata.

The following examples describe the process of the construction of an RE from FA.

#### Example 5.6

Construct an RE from the given FA in Fig. 5.1 by the algebraic method using Arden’s theorem

Fig. 5.1

Solution:

For the previously given FA, the equations are

Put the value of q2 in equation no. (2).

This equation is in the form R = Q + RP, where R = q1, Q = q0.0, and P = (1 + 0.0).

So, the solution of the equation is R = QP*, i.e., q1 = q0.0. (1 + 0.0)*.

Putting the value q2 = q1.0 in equation no. (1), we get

q0 = q0.1 + q1.0.1 + Λ.

Again putting the value of q1 in the previous equation, we get

This equation is in the form R = Q + RP, where R = q0, Q = Λ, and P = (1 + 0.(1 + 0.0)*.0.1). So, the solution of the equation is

Putting the value of q0 in the solution of q1, we get

q1 = (1 + 0.(1 + 0.0)*.0.1)*.0. (1 + 0.0)*.

Replacing the value of q1 in equation (3), we get

q2 = (1 + 0.(1 + 0.0)*.0.1)*.0. (1 + 0.0)*.0.

As q2 is the final state of the FA, all the strings will halt on this state. Therefore, the RE is (1 + 0. (1 + 0.0)*.0.1)*.0. (1 + 0.0)*.0, thus accepting the FA.

#### Example 5.7

Construct an RE from the given FA in Fig. 5.2 by the algebraic method using Arden’s theorem.

Fig. 5.2

Solution: For the previously given FA, the equations are

Putting the value of q0 in equation q1, it becomes q1 = 0 + q10. The equation is in the form R = Q + RP, where R = q1, Q = 0, and P = 0. By the Arden’s theorem, the solution of the equation is R = QP*, i.e.,

q1 = 00*.

Putting the value of q1 in the equation (3), we get q2 = 00*1 + q21.

The equation is in the form R = Q + RP, where R = q2, Q = 00*1, and P = 1. By the Arden’s theorem, the solution of the equation is R = QP*, i.e.,

q2 = 00*11*.

Putting the value of q2 and q0 in equation (4), we get

q3 = (00*11*0 + 1) + q3 (0 + 1).

The equation is in the form R = Q + RP, where R = q3, Q = (00*11*0 + 1), and P = (0 + 1). By the Arden’s theorem, the solution of the equation is R = QP*, i.e.,

q3 = (00*11*0 + 1) (0 + 1)*.

As q3 is the final state, the RE generated from the given FA is (00*11*0 + 1) (0 + 1)*.

#### Example 5.8

Construct an RE from the given FA in Fig. 5.3 by the algebraic method using Arden’s theorem

Fig. 5.3

Solution: For the given FA, the equations are

Substituting the value of q2 and q3 in q1, we get

If q1 is treated as R, Λ as Q, and (01 + 10) as P, then the equation becomes R = Q + RP.

The solution for the equation is R = QP*.

As q1 is the only final state, the string accepted by the FA is (01 + 10)*.

(Note: In the automata the transition from q2 and q3 for input 1 is missing. A dead state may be added to complete this FA, but the equation for dead state will be in no need in solving the automata. From here, it is also proved that the dead state is only necessary to prove that a string is not accepted by an FA by totally traversing the string.)

#### Example 5.9

Construct an RE from the given FA in Fig. 5.4 by the algebraic method using Arden’s theorem.

Fig. 5.4

Solution: For the given FA, the equations will be

q3 = q2.a, replace this in equation (2). The equation becomes

The equation is in the format of R = Q + RP, q2 [R] = q2[R](b + a.b)[P] + q1.b [Q].

The solution is R = QP*, i.e., q2 = q1b.(b + ab)*.

Replace the value of q2 in equation (3).

q3 = q1b.(b + ab)*.a.

Replace the value of q3 in equation (1).

The equation is in the format of R = Q + RP, q1[R] = q1[R].(a + b.(b + ab)*.a.a)[P] + Λ[Q]. The solution of the equation is R = QP*, i.e., q1 = Λ.(a + b.(b + ab)*.a.a)*,

q1 is the final state, and so the RE accepting the FA is

(a + b.(b + ab)*.a.a)*.

#### 5.5  Construction of Finite Automata from a Regular Expression

RE is the language format of the FA. Deterministic finite automata (DFA) can be constructed from a given RE in two ways:

1. Conversion of an RE to non-deterministic finite automata (NFA) with ε transition, and then converting it to DFA by ε closure.
2. Direct conversion of an RE to DFA.

#### 5.5.1  Conversion of an RE to NFA with ε transition

For making an FA from an RE, there are two steps as given in the following.

Step I: From the given RE, an equivalent transitional system (FA) with Λ move has to be constructed.

Step II: From the transitional system with Λ moves, a DFA needs to be constructed by the ε closure method.

For constructing step I, again there are two methods (i) top-down approach and (ii) bottom-up approach. In the top-down approach, the FA construction starts from the given RE and ends by reaching a single element for each transition. In the bottom-up approach, the construction starts from the basic element and ends on reaching the RE.

#### 5.5.1.1  Top-Down Approach

This is divided into several steps as given in the following.

• Take two states, one is the beginning state and another is the final state. Make a transition from the beginning state to the final state and place the RE in between the beginning and final states (Fig. 5.5).

Fig. 5.5

• If in the RE there is a + (union) sign, then there are parallel paths between the two states (Fig. 5.6).

Fig. 5.6

• If in the RE there is a .(dot) sign, then one extra state is added between the two states (Fig. 5.7).

Fig. 5.7

• If in the RE there is a ‘*’ (closure) sign, then a new state is added in between. A loop is added on the new state and the label Λ is put between the first to new and new to last (Fig. 5.8)

Fig. 5.8

The following examples make the described process clear for understanding.

#### Example 5.10

Construct an FA equivalent to the RE:

L = (a + b)*(aa + bb)(a + b)*.

Solution:

• Two states are taken: one is the beginning state and another is the final state. The RE is placed between the two states with a transition from the beginning state to the final state as given in Fig. 5.9(a)

Fig. 5.9(a)

• In the RE, there are two concatenations, between (a + b)* and (aa + bb) and between (aa + bb) and (a + b)*. Two extra states are added between q0 and qf as given in Fig. 5.9(b).

Fig. 5.9(b)

• In aa and bb, there is a + (union) sign. So, between q1 and q2, parallel paths are constructed as given in Fig. 5.9(c).

Fig. 5.9(c)

• There are two * (closure) signs between q0 and q1, and q2 and qf . Two new states q3 and q4 are added, respectively, between q0, q1 and q2, qf. The loops with label a + b are added to q3 and q4, making transitions between (q0, q3), (q3, q1), (q2, q4), and (q4, qf) with label Λ as in Fig. 5.9(d).

Fig. 5.9(d)

• Removing + (union) and • (concatenation) of the final FA with Λ transitions is given in Fig. 5.9(e).

Fig. 5.9(e)

• Between q0, q3 and q4, qf, there is a label Λ. It means that from q0 by getting no input it can reach to q3, and the same thing is for q4 and qf. Therefore, q0 and q3 may be treated as the same state, and the same for q4 and qf. The transitions from q3 and qf are added with q0 and q4, respectively, with same label.

By removing the Λ transactions, the FA is given in Fig. 5.9(f)

Fig. 5.9(f)

#### Example 5.11

Construct Finite Automata equivalent to the Regular Expression

L = ab(aa + bb)(a + b)* b.

Solution:

Step I: Take a beginning state q0 and a final state qf. Between the beginning and final state place the regular expression (Fig. 5.10a).

Step II: There are three dots (.) between ab, (aa + bb), (a + b)*, and b. Three extra states are added between q0 and qf (Fig. 5.10b).

Step III: Between ‘a’ and ‘b’ there is a dot (.), so extra state is added (Fig. 5.10c).

Step IV: In aa + bb there is a +, therefore there is parallel edges between q1 and q2. Between q2 and q3 there is (a + b)*. So, extra state q5 is added between q2 and q3. Loop with label a, b is placed on q5 and Λ transition is made between q2, q5 and q5, q3 (Fig. 5.10d).

Step V: In aa and bb there are dots (.). Thus two extra states are added between q1 and q2 (one for aa and another bb). The final finite automata for the given regular expression is given in Fig. 5.10e.

Figs. 5.10(a) to (e) Construction of FA for L = ab(aa + bb)(a + b)*b.

#### Example 5.12

Construct an FA equivalent to the RE:

L = ab + (aa + bb)(a + b)* b.

Solution :

Step I: Take a beginning state q0 and final state qf. Between them, place the RE.

Step II: For ‘ + ’, there will be two parallel edges between q0 and qf.

Step III: For ‘.’ dots between a and b, (aa + bb), and (a + b)* and (a + b)* and b, add three new states.

Step IV: For (aa + bb), two parallel edges with two extra states are added.

Step V: ‘*’ is removed by putting the loop on another extra state with null transitions between q2 to the new state and the new state to q3.

Step VI: Removing the Λ transitions, the final FA is given in Fig. 5.11.

Fig. 5.11 Construction of FA for L = ab + (aa + bb)(a + b)*b.

The bottom-up approach to convert an RE to an FA is proposed by Thomson. This method is known as the Thomson construction. The process is described in the following text.

#### 5.5.1.2  Bottom-up Approach (Thomson Construction)

• For input ε, the transition diagram is constructed as in Fig. 5.12(a).
• For input a ∈ ∑, the transition diagram is as shown in Fig. 5.12(b).
• If NFA(r1) is the NFA for RE r1 and NFA(r2) is the NFA for RE r2, then for the RE r1 + r2 the transition diagram is as shown in Fig. 5.12(c).
• If NFA(r1) is the NFA for RE r1 and NFA(r2) is the NFA for RE r2, then for the RE r1.r2 the transition diagram is as shown in Fig. 5.12(d).
• If NFA(r) is the NFA for RE r, then for the RE r*, the transition diagram is as shown in Fig. 5.12(e).

Figs. 5.12(a)–(e) The Thomson Construction

#### Example 5.13

Construct Finite Automata equivalent to the Regular Expression

L = ab(aa + bb)(a + b)*a.

Solution:

Step I: The terminal symbols in L are ‘a’ and ‘b’. The transition diagrams for ‘a’ and ‘b’ are given in Fig. 5.13(a).

Step II: The transition diagrams for ‘aa’, ‘ab’, ‘bb’ are given in Fig. 5.13(b).

Step III: The transition diagram for (a+b) is given in Fig. 5.13(c)

Step IV: The transition diagram for (a+b)* is given in Fig. 5.13(d).

Step V: For (aa+bb) the transitional diagram is given in Fig. 5.13(e).

Step VI: The constructed transitional diagram for ab(aa+bb) is given in Fig. 5.13(f).

Step VII: The constructed transitional diagram for ab(aa + bb)(a + b)*a is given in Fig. 5.13(g).

This can be simplified to Fig. 5.13(h) by removing ε transitions.

Figs. 5.13 (a – h) Construction of FA from RE by Thomson Construction

Kleene’ s Theorem: If R is an RE, then there exists an NDFA, MNFA with ε transition, which accepts the language generated by R.

#### 5.5.2  Direct Conversion of RE to DFA

Using this method, a given RE is converted to a DFA without the construction of an NFA. Before describing the process, we need to know a few definitions such as ‘firstpos’, ‘followpos’, ‘nullable’, and the construction processes of those.

firstpos: It is the abbreviation of first position. It is defined as the set of the positions of the first symbols of strings generated by the sub-expression rooted by n.

followpos: It is the abbreviation of follow position. It is defined as the set of the positions of the last symbols of strings generated by the sub-expression rooted by n.

Nullable: It gives two results ‘true’ or ‘false’. It is true if the empty string is a member of strings generated by the sub-expression rooted by n and false otherwise.

The construction of firstpos is made according to the following table.

followpos is constructed only for the leaf nodes. It is constructed in the following way.

• If n is a dot (.) node containing the left child c1 and the right child c2, and i is a position in lastpos(c1), then all positions in firstpos(c2) belong to followpos(i).
• If n is a star node, and i is a position in lastpos(n), then all positions in firstpos(n) belong to followpos(i).

The DFA is constructed by the following steps:

Step I: Make the RE R as augmented by placing an end marker #, and making it M#. Generate a parse tree from M#.

Step II: Calculate the firstpos and lastpos for all the internal and leaf nodes. Calculate the followpos for the leaf nodes.

Step III: Take the firstpos(root) as an unmarked state S of the constructing DFA.

Step IV: while (there exists an unmarked state S in the states of DFA)

do

Mark S and construct a transition from S using the following process for each input symbol ‘a’ as an alphabet of R

do

let S contain ‘a’ in position i1,i2 ...,in, then

if (S′ is not empty and have not appeared in the states of the DFA)

put S′ as an unmarked state into the states of the DFA.

#### Example 5.14

Convert the RE (a + b)*abb directly into an DFA.

Solution: The augmented RE is (a + b)*abb#

The parse tree constructed from the augmented RE in given in Fig. 5.14.

Fig. 5.14

The LHS of each node (including the leaf) is the firstpost of that node and the RHS of each node is the lastpos of that node. The firstpos and lastpos are constructed from the table given previously.

[Consider the dot (•) node connecting * node and ‘a’. The dot node is considered as c1 and the * node as c2. The nullable(c1) is true as c1 is a star node. So, the firstpos of the dot node is firstpos(c1) ∪ firstpos(c2).]

Consider the * node. The lastpos of the * node is {1,2}, and thus the followpos(1) and followpos(2) contain {1, 2}, as {1, 2} is the firstpos of the * node.

followpos(1) = {1, 2}
followpos(2) = {1, 2}

Consider the dot node connecting the * node (as c1) and ‘a’ with label ‘3’(as c2). The lastpos of c1 contains {1, 2} and the firstpos of c2 contains {3}. So, the followpos(1) and followpos(2) contain {3}

followpos(1) = {1, 2, 3}
followpos(2) = {1, 2, 3}

Consider the dot node connecting the dot node (as c1) and ‘b’ with label ‘4’(as c2). The lastpos of c1 contains {3} and the firstpos of c2 contains {4}. So, the followpos(3) contains {4}

followpos (3) = {4}

Consider the dot node connecting the dot node (as c1) and ‘b’ with label ‘5’(as c2). The lastpos of c1 contains {4} and the firstpos of c2 contains {5}. Thus, the followpos(4) contains {5}.

followpos(4) = {5}

Consider the dot node connecting the dot node (as c1) and ‘#’ with label ‘6’(as c2). The lastpos of c1 contains {5} and the firstpos of c2 contains {6}. Thus, followpos(5) contains {6}.

followpos(5) = {6}
followpos(6) = { }

a {1,2,3}. Mark it as S1. The RE contains two symbols ‘a’ and ‘b’. ‘a’ exists in positions ‘1’ and ‘3’, and ‘b’ appears in positions ‘2’, ‘4’, and ‘5’.

It is other than S1; therefore, mark it as S2.

δ(S1, b) = followpos(2) = {1, 2, 3} same as S1.

Take         S2 = {1, 2, 3, 4}.

All the states are traversed, and no new state appears. Therefore, it is the final DFA. The transitional diagram of the DFA is given is Fig. 5.15.

Fig. 5.15

#### 5.6  NFA with ε Move and Conversion to DFA by ε-Closure Method

If any FA contains any ε (null) move or transaction, then that FA is called an NFA with ε moves. An example is shown in Fig. 5.16.

Fig. 5.16

The previous FA in Fig. 5.16 contains two null moves. So, the previous FA is an NFA with null move.

From the definition of an NFA, we know that for an NFA from a single state for a single input the machine can go to more than one state, i.e., Q × Σ → 2Q, where 2Q is the power set of Q.

From a state by getting ε input, the machine is confined into that state. An FA with null move must contain at least one ε move. For this type of FA for input ε, the machine can go to more than one state. (One is that same state and the another is the ε-transaction next state). So, an FA with ε move can be called as an NFA.

For the previous FA,

An NFA with ε move can be defined as

MNFA null = {Q, Σ, δ, q0, F},

where Σ: Set of input alphabets including ε and δ is a transitional function mapping Q × Σ → 2Q, where 2Q is the power set of Q.

∈-closure: ε-Closure of a state is defined as the set of all states S, such that it can reach from that state to all the states in S with input ε (i.e., with no input).

#### Example 5.15

Consider the following NFA in Fig. 5.17 with ε moves.

Fig. 5.17

From state q0 by input ε (i.e., with no input), we can reach states q1 and q2. With no input, we can reach the same state, i.e., q0. So, we can write

ε-closure(q0) = {q0, q1, q2}

Similarly, ε-closure(q1) = {q0, q1, q2}.

ε-closure(q2) = {q0, q1, q2} (from q1 with no input we can reach q0, from q2 with no input we can reach q1, and so from q2 with no input we can reach q0).

#### 5.6.1  Conversion of an NFA with ε move to a DFA

In the FA, it is discussed on how to remove ε moves from an FA. The method was the transitional method. In this section, we shall learn the removal of ε moves from an FA using the ε-closure method.

Step I: Find ε-closure of all the states.

Step II: Construct a transitional function δ′ for the beginning state for all inputs. The function δ′ will be calculated similar to the following.

δ′ (Q, i/p) = ε–closure (δ(Q, i/p))

Let ε–closure(A) = {q1, q2, q3} = Q, So, the previous equation will become

Step III: First, find the ε-closure of the initial state. Rename the set of states as a new state. Then, find the function δ′ of that state for all inputs. If δ′ of that state for all inputs is constructed, then the state is known to be marked (fully traversed for all the inputs). If any new combinations of state appear in the process of marking other than the marked state, then rename that as a new state.

Step IV: Repeat step III for all new unmarked states. If no new state appears, then stop the construction.

Step V: In this process, the states which contain the final state as entry are the final states of the new NFA without ε move.

The following examples describe the process in detail.

#### Example 5.16

Convert the following NFA with ε-move in Fig. 5.18 to an equivalent DFA.

Fig. 5.18

Solution:

ε-closure(q0) = {q0, q1, q2}. Let us rename this state as A. Then, construct the δ′ function for the new unmarked state A for inputs 0, 1, and 2.

As it is a new combination of state other than A, rename it as B.

As it is a new combination of state other than A and B, rename it as C.

As δ′ is constructed for all inputs on A, A is marked.

B is unmarked still.

Then, construct the δ′ function for the new unmarked state B for inputs 0, 1, and 2.

As δ′ is constructed for all inputs on B, B is marked.

C is unmarked still.

Then, construct the δ′ function for the new unmarked state C for inputs 0, 1, and 2.

A, B, and C all contain q2, which is the final state; therefore, A, B, and C are final states.

The equivalent DFA is given in Fig. 5.19.

Fig. 5.19

#### Example 5.17

Convert the following NFA with ε-move in Fig. 5.20 to an equivalent DFA.

Fig. 5.20

Solution:

ε-closure(q0) = {q0, q1, q3}. Let us rename this state as A. Then, construct the δ′ function for the new unmarked state A for inputs ‘a’ and ‘b’.

As it is a new combination of state other than A, rename it as B.

As it is a new combination of state other than A and B, rename it as C.

As δ/ is constructed for all inputs on A, A is marked.

B is unmarked still.

Then, construct the δ′ function for the new unmarked state B for inputs ‘a’ and ‘b’.

As it is a new combination of state other than A, B, and C, rename it as D.

As δ′ is constructed for all inputs on B, B is marked.

C is unmarked still.

Then, construct the δ′ function for the new unmarked state C for inputs ‘a’ and ‘b’.

As δ′ is constructed for all inputs on C, C is marked.

D is unmarked still.

Then, construct the δ′ function for the new unmarked state D for inputs ‘a’ and ‘b’.

States A and C are final states as the states contain q3.

The equivalent DFA is given in Fig. 5.21

Fig. 5.21

#### 5.7  Equivalence of Two Finite Automata

Two FA are said to be equivalent if they generate the same (equivalent) RE. If both the FA are minimized, they reveal the equivalence by being minimized to the same number of states and the same transitional functions. In this section, we shall discuss the process of proving the equivalence between two FA.

Let there be two FA, M and M′, where the number of input symbols are the same.

• Make a comparison table with n + 1 columns, where n is the number of input symbols.
• In the first column, there will be a pair of vertices (q, q′) where q ∈ M and q′ ∈ M′. The first pair of vertices will be the initial states of the two machines M and M′. The second column consists of (qa, q′a), where qa is reachable from the initial state of the machine M for the first input, and q′a is reachable from the initial state of the machine M′ for the first input. The other n − 2 columns consist of a pair of vertices from M and M′ for n − 1 inputs, where n = 2, 3,………… n − 1.
• If any new pair of states appear in any of the n − 1 next state columns, which were not taken in the first column, take that pair in the present state column and construct subsequent column elements like the first row.
• If a pair of states (q,q′) appear in any of the n columns for a pair of states in the present state column, where q is the final state of M and q′ is the non-final state of M′ or vice versa, terminate the construction and conclude that M and M′ are not equivalent.
• If no new pair of states appear, which were not taken in the first column, stop the construction and declare that M and M′ are equivalent.

The following examples describe this in detail.

#### Example 5.18

Find whether the two DFAs given in Fig. 5.22 are equivalent or not.

Fig. 5.22

Solution: In the previous two DFAs, the number of input is two 0 and 1. Therefore, the comparison table is of three columns. The beginning state pair is (A, G). From there, for input 0, we are getting (B, H) and for input 1 we are getting (C, H). Both of them are new combination of states. Among them, first take (B, H) in the present state column. The next state pairs for input 0 and 1 are constructed. By this process, the comparison continues.

After the fifth step, further construction is stopped as no new state combinations have appeared. In the whole table, no such type of combination of states appear where one state is the final state of one machine and the other is the non-final state of another machine.

Therefore, the two DFAs are equivalent.

Present State (q, q′)
Next State
For I/P 0 (q0, q′0)
For I/P 1 (q1, q′1)
(A, G)
(B, H)
(C, H)
(B, H)
(C, H)
(D, I)
(C, H)
(C, H)
(E, I)
(E, I)
(E, I)
(E, I)
(D, I)
(D, I)
(D, I)

#### Example 5.19

Find whether the two DFAs given in Fig. 5.23 are equivalent or not.

Fig. 5.23

Solution: In the previous two DFAs, the number of input is two 0 and 1. Therefore, the comparison table is of three columns. The beginning state pair is (A, E). From there, for input 0, we get (B, F) and for input 1 we get (D, H). Both of them are new combination of states. Among them, take (B, F) in the present state column. The next state pairs for input 0 is (C, G) and for input 1 is (C, H). (C, H) is a combination of states where C is the final state of M and H is the non-final state of M′. Further construction stops here declaring that the two DFAs are not equivalent.

Present State (q, q′)
Next State
For I/P 0 (q0, q′0)
For I/P 1 (q1, q′1)
(A, E)
(B, F)
(D, H)
(B, F)
(C, G)
(C, H)

#### 5.8  Equivalence of Two Regular Expressions

For every RE, there is an accepting FA. If the FA constructed from both of the REs are the same, then we can say that two REs are equivalent.

#### Example 5.20

Prove that the following REs are equivalent.

L1 = (a + b)* L2 = a*(b*a)*

Solution: The FA for L1 is constructed in Fig. 5.24.

Fig. 5.24

The FA for L2 is constructed in Fig. 5.25.

Fig. 5.25

We are seeing that the FA generated by the two REs L1 and L2 are the same. So, we can decide that the two FA are equivalent.

#### Example 5.21

Prove that the following REs are equivalent.

L1 = 1*(011)*(1*(011)*)* L2 = (1 + 011)*

Solution: The FA for L1 is constructed in Fig. 5.26.

Fig. 5.26

The FA for L2 is constructed in Fig. 5.27.

Fig. 5.27

The FA generated by the two REs L1 and L2 are the same and, therefore, they are equivalent.

#### 5.9  Construction of Regular Grammar from an RE

We have learnt how to construct a grammar from a language. But in the case of an RE, it is a little difficult to find the exact grammar of it using the conventional methods. The following section describes the process of constructing regular grammar from an RE.

Step I: Construct the equivalent FA for the given RE (eliminate all null moves).

Step II: The number of non-terminals of the grammar will be equal to the number of states of the FA.

Step III: For all transitional functions in the form δ(Q1, a) → Q2, [Q1, Q2 ∈ Q and a ∈ Σ], the production rule is in the form A → aA1. If Q2 is a final state, then for the transitional function δ(Q1, a) → Q2 the production rules are A → aA1 and A → a. The start symbol is the symbol representing the initial state of the FA.

The following examples describe the discussed method in detail.

#### Example 5.22

Construct a regular grammar for the following REs:

1. a*(a + b)b*
2. ab(a + b)*
3. 10 + (0 + 11)0*1

Solution:

1. The FA for the string a*(a + b)b* is constructed in Fig. 5.28.

Fig. 5.28

The number of states of the FA is 2, which means that there are two non-terminals in the grammar for the RE. Let us take them as A (for q0) and B (for q1).

For the transitional function δ(q0, a) → q0, the production rule will be A → aA. For the transitional function δ(q0, a) → q1, the production rule will be A → aB. As q1 is a final state, there will be another production rule A → a.

The start symbol will be A as q0 is the beginning state.

The grammar G for the RE a*(a + b)b* is {VN, Σ, P, S} where

VN = {A, B}
Σ = {a, b}

P : {A → aA/ bB/a/b, B → bB/b}

S :{A}.

Fig. 5.29

2. The DFA for the string ab(a + b)* is given in Fig. 5.29.

There are three states in the DFA. The number of non-terminals for the grammar of the RE will be 3. Let us take them as A (for q0), B (for q1), and C (for q2).

For the transitional function δ(q0, a) → q1, the production rule will be A → aB

The start symbol will be A as q0 is the beginning state.

The grammar G for the RE ab(a + b)* is {VN, Σ, P, S} where

VN = {A, B, C}
Σ = {a, b}

P : {A → aB, B → bC/b, C → aC/bC/a/b}

S: {A}.

3. The NFA without null move for the RE 10 + (0 + 11)0*1 is shown in Fig. 5.30.

The equivalent DFA for the previous NFA is given in Fig. 5.31.

There are four states of the DFA for the RE 10 + (0 + 11)0*1. So, for the grammar of the RE, there will be four non-terminals. Let us take them as A (for Q0), B (for Q1), C (for Q2), and D (for Q3).

Fig. 5.30

Fig. 5.31

Now, we have to construct the production rules of the grammar.

For the transitional function δ(Q0, 0) → Q1, the production rule will be A → 0B.

Similar to δ(Q0, 1) → Q2, the production rule will be A → 1C.

δ(Q1, 0) → Q3, the production rule will be B → 0D and B → 0 (as Q3 is the final state)

δ(Q1, 1) → Q2, the production rule will be B → 1C

δ(Q2, 0) → Q2, the production rule will be C → 0C

δ(Q2, 1) → Q3, the production rule will be C → 1D and C → 1 (As Q3 is the final state).

The start symbol will be A as Q0 is the beginning state.

The grammar G for the RE 10 + (0 + 11)0*1 is {VN, Σ, P, S} where

VN = {A, B, C, D}
Σ = {0, 1}

P : {A → 0B/1C, B → 0D/1C/0, C → 0C/1D/1}

S : {A}.

(N.B: From NFA without ∈-move, the regular grammar can be constructed. We can do this by the construction of the equivalent DFA.)

#### 5.10  Constructing FA from Regular Grammar

FA can directly be constructed from regular grammar. This can be considered as a reverse process of constructing the FA from an RE. The following section describes the process of constructing the FA from an RE.

Step I:

• If the grammar does not produce any null string, then the number of states of the FA is equal to the number of non-terminals of the regular grammar +1. Each state of the FA represents each non-terminal and the extra state is the final state of the FA. If it produces a null string, then the number of states is the same as the number of non-terminals.
• The initial state of the FA is the start symbol of the regular grammar.
• If the language generated by the regular grammar contains a null string, then the initial state is also the final state of the constructing FA.

Step II:

• For a production in the form A → aB, make a δ function δ(A, a) → B.

There is an arc from state A to state B with label a.

• For a production in the form A → a, make a δ function δ(A, a) → final state.
• For a production A → ∈, make a δ function δ(A, ∈) → A, and A is the final state.

Consider the following examples.

#### Example 5.23

Convert the following regular grammar into FA:

Solution: In the grammar, there are three non-terminals, namely S, A, and B. Therefore, the number of states of the FA is four. Let us name the final state as C.

1. For the production S → aA/bB, the transitional diagram is given in Fig. 5.32(a)
2. For the production S → a/b, the transitional diagram including the previous one becomes as Fig. 5.32(b).

Fig. 5.32(a)

Fig. 5.32(b)

3. For the production A → aS/bB/b, the transitional diagram including the previous one looks like Fig. 5.32(c).

Fig. 5.32(c)

For the production B → aA/bS, the transitional diagram including the previous one looks like Fig. 5.32(d)

Fig. 5.32(d)

This is the FA for the given regular grammar.

#### Example 5.24

Convert the following regular grammar into FA:

Solution: In the grammar, there are three non-terminals, namely S, A, and B. So, the number of states of the FA will be four. Let us name the final state as C.

• For the production S → aA/bS, the transitional diagram becomes
• For the production A → bB/a, the transitional diagram including the previous one is
• For the production B → aS/b, the transitional diagram including the previous one looks like

This is the FA for the given regular grammar. The construction process is described in Figs. 5.33(a) to 5.33(c).

Fig. 5.33(a)

Fig. 5.33(b)

Fig. 5.33(c)

#### 5.11  Pumping Lemma for Regular Expression

There is a necessary condition for an input string to belong to a regular set. This necessary condition is the pumping lemma. Pumping means generating. This lemma is called a pumping lemma because it gives a method of generating many input strings from a given string.

As the pumping lemma is a necessary condition for a string to belong to a regular set, it is used to prove that certain sets are not regular. If any set fulfills all the conditions of Pumping Lemma it can not be said confirm that the set is regular.

But the reverse is true, i.e., if any set breaks the conditions of the pumping lemma, it can be said that the set is not regular. So, the pumping lemma is used to prove that certain sets are not regular.

Theorem 5.2: Statement of the pumping lemma: Let M = {Q, Σ, δ, q0, F} be an FA with n number of states. Let L be a regular set accepted by M. Let w be a string that belongs to the set L and | w | ≥ m. If m ≥ n, i.e., the length of the string is greater than or equal to the number of states, then there exists x, y, z such that w = xyz, where | xy | ≤ n, | y | >0 and xyiz ∈ L for each i ≥ 0.

(It needs some clarification, after that we shall go to the proof of it.

Let us take the following FA in Fig. 5.34. Here, from q0, by getting ‘a’ it goes to q1 and from q1 by getting ‘b’ it goes to q2, which is the final state. The FA consists of three states, q0, q1, and q2. The string that is accepted by the FA is ba. The length of the string is 2 which is less than the number of states of the FA.

Fig. 5.34

Here, from a state by getting a single input, it goes to a single distinct state. But, if the length of the string is equal or greater than the number of states of the FA, then from a state by getting a single input it is not possible to get all single distinct states. There must be a repetition of at least a single state. This can be described by the following diagram in Fig. 5.35.

Fig. 5.35

The RE accepted by the automata is ba*b. The expression can be divided into three parts: x, y, z where y is the looping portion, x is the portion before looping, and z is the portion after the looping portion.)

Proof: Let w be a string of length m where m is greater than n, the number of states of the FA accepting the string w as given in Fig. 5.36.

Fig. 5.36

w = a1a2a3……..am | w | = m ≥ n

Starting from the beginning state q0, by getting the string w as input, the machine reaches the final state.

In general,

δ(q0, a1a2a3……..ai) = qi for i = 1,2,3, ……m.

The set of the states followed by the string w in the transitional path from the beginning state to the final state are Qn = {q0, q1, q2, . . . . . qm}. The number of states in the set is m.

But the number of states of the FA accepting the string w is n. As m ≥ n, at least two states in the set Qn must coincide.

Take two integers j and k where 0 ≤ j < k ≤ n. Among the various pairs of repeated states, take a pair qj, qk. The string w can be decomposed into three substrings: a1a2 . . . . . aj, aj + 1 . . . . . ak, and ak + 1 . . . . . am. Let us denote them as x, y, and z, respectively. As k ≤ n, the length of the string a1…ak is ≤ n, i.e., | xy | ≤ n and w = xyz.

The automaton starts from the initial state q0. On applying the string x, it reaches the state qj. On applying the string y, it comes back to qj as qj = qk as denoted in Fig. 5.37. So, on applying the string y, i number of times (where i ≥ 0) on qj, it will be in the same state qj. On applying z on qj, the automaton will reach the final state. Hence, the string xyiz with i ≥ 0 belongs to the language set L., i.e., xyiz ∈ L.

Fig. 5.37

Application of the pumping lemma: The pumping lemma is used to prove that certain sets are not regular. If an expression satisfies the conditions for the pumping lemma to be good, then it cannot be said that the expression is regular. But the opposite is true. If any expression breaks any condition of the pumping lemma, it can be declared that the expression is not good.

This needs certain steps.

Step I: Assume that the set L is regular. Let n be the number of states of the FA accepting L.

Step II: Choose a string w (w ∈ L) such that | w | ≥ n. By using the pumping lemma, we can write w = xyz, with | xy | ≤ n and | y | > 0.

Step III: Find a suitable integer i such that xyiz ∉ L. This will contradict our assumption. From here, L will be declared as not regular.

The following examples help to clear this.

#### Example 5.25

Show that L = {ai2 | i ≥ 1} is not regular.

Solution:

Step I: Assume the set L is regular. Let n be the number of states of the FA accepting the set L.

Step II: Let w = an2. | w | = n2 which is greater than n, the number of states of the FA accepting L. By using the pumping lemma, we can write w = xyz with | xy | ≤ n and | y | > 0

Step III: Take i = 2. So, the string will become xy2z.

From step II, we know | w | = | xyz | = | x | + | y | + | z | = n2.

So, | xy2z | > n2.

Again, | xy2z | = | x | + 2 | y | + | z | = | x | + | y | + | z | + | y | = n2 + | y | .

As | xy | ≤ n, | y | ≤ n.

Therefore, | xy2z | ≤ n2 + n.

From the previous derivations, we can write

n2 < | xy2z | ≤ n2 + n < n2 + n + n + 1
n2 < | xy2z | < (n + 1)2.

Hence, | xy2z | lies between n2 and (n + 1)2. They are the square of two consecutive positive integers. In between the square of two consecutive positive integers, no square of positive integer belongs. But ai2, where i ≥ 1 is a perfect square of an integer. So, the string derived from it, i.e., | xy2z | is also a square of an integer, which lies between the square of two consecutive positive integers. This is not possible.

So, xy2z ∉ L. This is a contradiction.

So, L = L = {ai2 | i ≥ 1} is not regular.

#### Example 5.26

Show that L = {ap | p is prime} is not regular.

Solution:

Step I: Assume that the set L is regular. Let n be the number of states in the FA accepting L.

Step II: Let p be a prime number which is greater than n. Let the string w = ap, w ∈ L. By using the pumping lemma, we can write w = xyz with | xy | ≤ n and | y | > 0. As the string w consists of only ‘a’, x, y, and z are also a string of ‘a’s. Let us assume that y = am for some m with 1≤ m ≤ n.

Step III: Let us take i = p + 1. | xyiz | will be | xyz | + | yi-1 |

p(1 + m) is not a prime number as it has factors p and (1 + m) including 1 and p(1 + m). So, xyiz ∉ L. This is a contradiction.

Therefore, L = {ap | p is prime} is not regular.

#### Example 5.27

Show that L = {ai3 | i ≥ 1} is not regular.

Solution:

Step I: Assume that the set L is regular. Let n be the number of states of the FA accepting the set L.

Step II: Let L = an3. | w | + n3 which is greater than n, the number of states of the FA accepting L. By using the pumping lemma, we can write w = xyz with | xy | ≤ n and | y | > 0.

Step III: Take i = 3. So, the string will become xy3z.

As | xy | ≤ n so, | y | ≤ n.

Therefore, | xy3z | = n3 + 2 | y | ≤ n3 + n < n3 + 3n2 + 3n + 1 = (n + 1)3.

As | y | > 0, | xy3z | = | xyz | + 2 | y | > n3.

We can write

n3 < | xy3z | < (n + 1)3.

xy3z is a perfect cube which lies between n3 and (n + 1)3, i.e., two consecutive perfect cube numbers. In between the cube of two consecutive positive integers, no cube of positive integer belongs. But ai3, where i ≥ 1 is a perfect cube of an integer. So, the string derived from it, i.e., | xy3z | , is also a cube of an integer, which lies between a cube of two consecutive positive integers. This is not possible. So, xy3z ∉ L. This is a contradiction. So, L = {ai3 | i ≥ 1} is not regular.

#### Example 5.28

Show that L = {anbn where n ≥ 1} is not regular.

Solution:

Step I: Let us suppose that the set L is regular. Let n be the number of states of the FA accepting L.

Step II: Let w = anbn, | w | + 2n >n. By the pumping lemma, we can write w = xyz with | xy | ≤ n and | y | > 0.

Step III: We want to find a suitable i so that xyiz ∉ L.

The string y can be of any of the following:

1. y is a string of only ‘a’, i.e., y = ak for some k ≥ 1
2. y is a string of only ‘b’, i.e., y = bk for some k ≥ 1
3. y is a string of both ‘a’ and ‘b’, i.e., y = ak bl for some k, 1 ≥ 1

For case (i), take i = 0. As xyz = anbn, xy0z = xz will be an-kbn.

As k ≥ 1, (n−k) ≠ n. So, xy0z ∉ L.

For case (ii), take i = 0. As xyz = anbn, xy0z = xz will be anbn-k.

As k ≥ 1, (n−k) ≠ n. So, xy0z ∉ L.

For Case (iii), take i = 2. As xyz = anbn, xy2z = xyyz.

We know xyz = anbn = an-kak bl bn-l. So, xyyz will be

an-kak bl ak bl bn-l = anblakbn, which is not in the form anbn. So, xy2z ∉ L.

For all the three cases, we are getting contradiction. So, L is not regular.

#### Example 5.29

Show that L = palindrome over {a, b} is not regular.

Solution:

Step I: Suppose that the set L is regular. Let n be the number of states of the FA accepting L.

Step II: Let w = anban, | w | = (2n + 1) > n. By the pumping lemma, we can write w = xyz with | xy | ≤ n and | y | > 0.

Step III: We want to find a suitable i so that xyiz ∉ L. The string y may consist of only ‘a’, for example, aj where j > 0.

Let us take i = 0. xyz = anban = an-jaj ban. So, xy0z = an-jban.

As j > 0, n – j ≠ n. So, an-jban is not a palindrome.

So, xy0z ∉ L. Hence, it is proved that L is not regular.

#### Example 5.30

Show that L = {ww, where w ∈ (a, b)*} is not regular.

Solution:

Step I: Let us consider that the set L is regular. Let n be the number of states of the FA accepting L.

Step II: Let ww = anbanb ∈ L, and so | ww | = 2(n + 1) > n. By applying the pumping lemma, we can write w = xyz with | xy | ≤ n and | y | > 0.

Step III: We want to find a suitable i so that xyiz ∉ L. The string consists of ‘a’ and ‘b’. So, y can be any of these.

1. y has no ‘b’, i.e., y = ak
2. y has only one ‘b’ (because the string is anbanb)

Let us take i = 0 in case I. If i = 0, then xy0z = xz and it is in the form an-kbanb or anban-kb.

This is not in the form ww, i.e., xz ∉ L. Hence, L is not regular.

In case II, if we take i = 0, then xy0z = xz which contains only one ‘b’, which is not in the form ww, i.e., xz ∉ L. Hence, L is not regular.

#### Example 5.31

Show that L = {anban, where n ≥ 1} is not regular.

Solution:

Step I: Assume that L is regular. Let n be the number of states of the FA accepting L.

Step II: Let w = anban. So, | w | = 2n + 1 > n. According to the pumping lemma, we can write w = xyz, with w = xyz with | xy | ≤ n and | y | > 0.

Step III: We want to find a suitable i so that xyiz ∉ L. The string consists of ‘a’ and ‘b’. So, y can be any of these.

1. y has no ‘b’ i.e. y = ak
2. y has only one ‘b’ (because the string is anban)

For case I, take i = 2.

The string xyz = an–k akban or anbakan–k.

So, xy2z = an–ka2kban = an + kban or anba2kan–k = anban+k.

As k ≠ 0, n+k ≠ n. Therefore, xy2z ∉ L.

This is a contradiction, and so L is not regular.

For Case II, let us take i = k. So, xykz = anbkan. This is not in the form anban. Therefore, xykz ∉ L.

This is a contradiction, and so L is not regular.

#### Example 5.32

Show that L = {anbnabn + 1} is not regular.

Solution:

Step I: Assume that L is regular. Let n be the number of states of the FA accepting L.

Step II: Let w = anbnabn + 1, and so | w | = (3n + 2) > n. By using the pumping lemma, we can write w = xyz, with w = xyz with | xy | ≤ n and | y | > 0.

Step III: We want to find a suitable i so that xyiz ∉ L. The string consists of ‘a’ and ‘b’. So, y can be of any of these.

1. y contains only ‘a’, and let y = ak
2. y contains only ‘b’, and let y = bl
3. y contains both ‘a’ and ‘b’, and let y = bka

For case I, take i = 0.

For case II, take i = 0

For case III, take i = 2

In all the three cases, there are contradictions and, hence, the language is not regular.

#### Example 5.33

Show that the language containing the set of all balanced parenthesis is not regular.

Solution:

Step I: Assume that L is regular. Let n be the number of states of the FA accepting L.

Step II: Let w = (((….()….))) = (n)n. | w | = 2n > n. By using the pumping lemma, we can write w = xyz, with w = xyz with | xy | ≤ n and | y | > 0.

Step III: We want to find a suitable i so that xyiz ∉ L. The string consists of ‘(‘ and ‘)’. So, y can be of any of these

1. y consists of only ‘(‘, and let y = (k
2. y consists of only ‘)’, and let y = )k
3. y consists of both ‘(‘ and ‘)’, and let y = (k)k

For case I, take i = 0.

For case II, take i = 0.

For case III, take i = 2.

In all the three cases, there are contradictions and, hence, the language is not regular.

#### Example 5.34

Show that L = {0n12n, where n ≥ 1} is not regular.

Solution:

Step I: Assume that L is regular. Let n be the number of states of the FA accepting L.

Step II: Let w = 0n12n, where n ≥ 1. | w | = 3n> n. By using the pumping lemma, we can write

w = xyz, with w = xyz with | xy | ≤ n and | y | > 0.

Step III: We want to find a suitable i so that xyiz ∉ L. The string consists of ‘0’ and ‘1’. y can be any of the following forms

1. y consists of only ‘0’, and let y = 0k
2. y consists of only ‘1’, and let y = 1k
3. y consists of both ‘0’ and ‘1’, and let y = 0k1k

For case I, take i = 0.

w = xyz = 0n12n = 0n−k0k12n
xy0z = 0n−k12n, as k ≠ 0, (n−k) ≠ n.

For case II, take i = 0.

w = xyz = 0n12n = 0n1k12n−k
xy0z = 0n12n−k, as k ≠ 0, (2n−k) ≠ n.

For case III, take i = 2.

w = xyz = 0n12n = 0nk0k1k12n−k
xy2z = 0n1k0k12n, which is not in the form 0n12n

For all the three cases, we are getting contradictions and, therefore, L is not regular.

#### 5.12  Closure Properties of Regular Set

A set is closed (under an operation) if and only if the operation on two elements of the set produces another element of the set. If an element outside the set is produced, then the operation is not closed.

Closure is a property which describes when we combine any two elements of the set; the result is also included in the set.

If we multiply two integer numbers, we will get another integer number. Since this process is always true, it is said that the integer numbers are ‘closed under the operation of multiplication’. There is simply no way to escape the set of integer numbers when multiplying.

Let S = {1,2,3,4,5,6,7,8,9,10….} is a set of integer numbers.

1 × 2 = 2

2 × 3 = 6

5 × 2 = 10

All are included in the set of integer numbers.

We can conclude that integer numbers are closed under the operation of multiplication.

Theorem 5.3: Two REs L1 and L2 over Σ are closed under union operation.

Proof: We have to prove that if L1 and L2 are regular over Σ, then their union, i.e., L1∪ L2 will be also regular.

As L1 and L2 are regular over Σ, there must exist FA M1 = (Q1, Σ, δ1, q01, F1) and M2 = (Q2, Σ, δ2, q02, F2) such that L1 ε M1 and L2 ε M2

Assume that there is no common state between Q1 and Q2, i.e., Q1 ∩ Q2 = Ø.

Define another FA, M3 = (Q, Σ, δ, q0, F) where

1. Q = Q1 ∪ Q2∪{q0}, where q0 is a new state ∉ Q1∪Q2
2. F = F1 ∪ F2
3. Transitional function δ is defi ned as δ (q0, ε) → {q01, q02}

and

It is clear from the previous discussion that from q0 we can reach either the initial state q1 of M1 or the initial state q2 of M2.

Transitions for the new FA, M, are similar to the transitions of M1 and M2.

As F = F1 ∪ F2, any string accepted by M1 or M2 will also be accepted by M.

Therefore, L1 ∪ L2 is also regular.

Example:

Let

L1 = a*(a + b)b*
L2 = ab(a + b)*

The FA M1 accepting L1 is as shown in Fig. 5.38(a).

The FA M2 accepting L2 is as shown in Fig. 5.38(b).

The machine M produced by combining M1 and M2 is as shown in Fig. 5.38(c).

Figs. 5.38 (a)–(c)

It accepts L1 ∪ L2.

Theorem 5.4: The complement of an RE is also regular.

If L is regular, we have to prove that LT is also regular.

As L is regular, there must be an FA, M = (Q, Σ, δ, q0, F), accepting L. M is an FA, and so M has a transitional system.

Let us construct another transitional system M′ with the state diagram of M but reversing the direction of the directed edges. M′ can be designed as follows:

1. The set of states of M′ is the same as M
2. The set of input symbols of M′ is the same as M
3. The initial state of M′ is the same as the final state of M (M′ is the reverse direction of M)
4. The final state of M′ is the same as the initial state of M (M′ is the reverse direction of M)

Let a string w belong to L, i.e., w ∈ M. So, there is a path from q0 to F with path value w. By reversing the edges, we get a path from F to q0 (beginning and final state of M′) in M′. The path value is the reverse of w, i.e., wT. So, wT ∈ M′.

So, the reverse of the string w is regular.

Example:

Let L = ab(a + b)*.

The FA M accepting L is as shown in Fig. 5.39(a).

The reverse of the FA M′ accepting Lc is shown in Fig. 5.39(b).

Fig. 5.39 (a)–(b)

M′ accepts (a + b)*ba which is reverse of L.

Theorem 5.5: If L is regular and L is a subset of Σ*, prove that Σ* – L is also a regular set.

As L is regular, there must be an FA, M = (Q, Σ, δ, q0, F), accepting L. Let us construct another DFA M′ = (Q, Σ, δ, q0, F′) where F′ = Q–F. So, the two DFA differ only in their final states. A final state of M is a non-final state of M′ and vice versa.

Let us take a string w which is accepted by M′. So, δ(q0, w) ∈ F′, i.e., δ(q0, w) ∈ (Q – F).

The string w cannot be accepted by M, because δ(q0, w) ∉ F [as F does not belong to (Q – F)].

So, w∉L, i.e., Σ* L ≠ L. But Σ* L is accepted by M′ which is an FA.

Therefore, Σ* L is a regular set.

Theorem 5.6: REs are closed under intersection operation.

Proof: From D’Morgan’s theorem, we know

We know that if L1 and L2 are regular, then are also regular.

As are regular, is also regular (RE is closed under union operation).

As is regular, so complement ()C is also regular.

So, L1∩L2 is also regular, i.e., the regular sets are closed under intersection.

Theorem 5.7: Two DFA are closed under cross product.

Proof: Let D1 = {Q1, Σ, δ1, q01, F1} and D2 = {Q2, Σ, δ2, q02, F2} are two DFA accepting two RE L1 and L2 respectively. Let us construct a new FA, D as follows.

D = {Q, Σ, δ, q0, F}

where

Clearly D is a DFA. Thus DFA are closed under cross product.

#### 5.13  Decision Problems of Regular Expression

Decision problems are the problems which can be answered in ‘yes’ or ‘no’. FA are one type of finite state machines which contain a finite number of memory elements. Thus, it can memorize only finite amount of information. FA can answer to those decision problems which require only a finite amount of memory.

The decision problems related to RE are

1. Whether a string x belongs to a regular expression R? (x ∈ R?)
2. Whether the language set of an FA M is empty? [L(M) = Ø ?]
3. Whether the language set of an FA is finite? [L(M) is finite?]
4. Whether there exists any string accepted by two FA, M1 and M2?
5. Whether the language set of an FA M1 is a subset of the language set of another FA M2? [L(M1) ⊆ L(M2)?]
6. Whether two FA M1 and M2 accept the same language? [L(M1) = L(M2)]
7. Whether two REs R1 and R2 are from the same language set?
8. Whether an FA is the minimum state FA for a language L?

Proof

1. This problem is known as the membership problem. R can be converted to the equivalent FA M (see Section 5.5). x is applied on M. If M reaches its final state upon finishing x; x ∈R else x ∉ R.
2. An FA does not accept any string if it does not have any final state or if the final state is an inaccessible state. Let us calculate the set of state Sk reached from the beginning state q0 upon applying a string of length k.

If k = 0, it reaches the beginning state q0. It reaches SK if it was in state Sk−1 and ‘a’ as input is applied on SK−1.

Compare the set SK.

1. for string length k ≥ 0 whether a final state appears or
2. SK = SK−1 (loop) for k > 0.

For the case (a), L(M) ≠ Ø, and for case (b) L(M) = Ø .

3. In the pumping lemma (Section 4.11), it is discussed that for accepting a string of length ≥ n (the number of states of an FA), it has to traverse at least one loop. The language accepted by an FA is finite if it has length < n. Formulate an algorithm for testing as follows.

Give an input to M, the strings of length n or > n in increasing order. If for a string ‘s’ of length in between n and ≤ 2n, it reaches M then L(M) is infinite; else L(M) is finite.

Infinite means there exists x, y, z such that s = xyz, where | xy | ≤ n, | y | >0 and xyiz ∈ L for each i ≥ 0 (pumping lemma). Here, yi is the looping portion, which can generate an infinite number of strings.

4. For this decidable problem, construct an FA M accepting L(M1) ∩ L(M2). Then apply the decision problem 2 on M.
5. For this decidable problem, construct an FA M accepting L(M1) − L(M2). Then, apply decision problem 2 on M. It is true if L(M1) − L(M2) = Ø. (If L(M1) is a subset of L(M2), L(M1) − L(M2) will produce null.)
6. Two sets A and B are the same if A ⊆ B and B ⊆ A. Construct an FA, M, accepting L(M1) − L(M2) and M′ accepting L(M2) − L(M1) (reducing it to problem v). Problem vi is decidable if L(M1) − L(M2) = Ø and L(M2) − L(M1) = Ø.
7. There exists an algorithm to convert an RE to FA (see Section 5.5). Reduce this problem to problem (vi).
8. Minimize an FA, M, using 3.13 and generate M′. If the number of states of M and M′ are the same, then it is minimized; else it is not. Hence, decidable.

#### 5.14  ‘Grep’ and Regular Expression

REs have been an integral part of Unix since the beginning. Ken Thompson used the RE in the early computer text editor ‘QED’ and the Unix editor ‘ed’ for searching text. ‘grep’ is a find string command in Unix. It is an acronym for ‘globally search a regular expression and print it’. The command searches through files and folders (directories in Unix) and checks which lines in those files match a given RE. ‘grep’ gives output the filenames and the line numbers or the actual lines that matched the RE.

\$ grep RE chapter5

This command searches the string ‘RE’ in the file ‘chapter5’.

\$ grep ‘RE’ −i chapter5

This command searches the strings with case insensitive.

\$ grep h??a chapter5

This command displays all strings of length 4 starting with ‘h’ and ending with ‘a’.

If we enter into the internal operation of ‘grep’, we will see that we are searching for a string that belongs to an RE. Let us take the last grep which searches for four length strings starting with ‘h’ and ending with ‘a’. For this case, the RE is h(ch)+(ch)+a, where ‘ch’ is any symbol that belongs to a file (Generally symbols available in the keyboard).

#### 5.15  Applications of Regular Expression

RE is mainly used in lexical analysis of compiler design. In the programming language, we need to declare a variable or identifier. That identifier must start with a letter followed by any number of letters or digits. The structure of the identifier is represented by RE. The definition of an identifier in a programming language is

The definition of an unsigned number in a programming language is

The other application of RE is pattern searching from a given text.

#### What We Have Learned So Far

1. An RE can be defined as a language or string accepted by an FA.
2. Any terminal symbols, null string (∧), or null set (ɸ) are RE.
3. Union, concatenation, iteration of two REs, or any combination of them are also RE.
4. The Arden’s theorem states that if P and Q are two REs over Σ, and if P does not contain Λ, then the equation R = Q + RP has a unique (one and only one) solution R = QP*.
5. The Arden’s theorem is used to construct an RE from a given FA by the algebraic method.
6. If any FA contains any ∈ (null) move or transaction, then that FA is called NFA with ∈-moves.
7. The ∈-closure of a state is defined as the set of all states S, such that it can reach from that state to all the states in S with input ∈ (i.e., with no input).
8. The pumping lemma for RE is used to prove that certain sets are not regular.
9. A set is closed (under an operation) if and only if the operation on two elements of the set produces another element of the set.
10. Closure is a property which describes when we combine any two elements of the set, the result is also included in the set.
11. REs are closed under union, complementation, and intersection operation.

#### Solved Problems

1. Describe the following REs in English language.
1. a(a+b)*abb
2. (a+b)*aba(a+b)*
3. (0+1)*1(0+1)*0(0+1)*

Solution:

1. The language starts with ‘a’ and ends with ‘abb’. In the middle of ‘a’ and ‘b’, there is any combination of ‘a’ and ‘b’.

Hence, the RE described in English language is

{Set of any combination of ‘a’ and ‘b’ beginning with ‘a’ and ending with ‘abb’}.

2. The expression is divided into three parts: (a+b)*, aba, and (a+b)*. In each element of the language set, there is aba as a substring. In English language, the RE is described as {Set of any combination of ‘a’ and ‘b’ containing ‘aba’ as substring}.
3. The expression is divided into five parts: (0+1)*, 1, (0+1)*, 0 and (0+1)*. In each element of the language set, there is 1 and 0, where 1 appears first. In English language, the RE is described as

{Set of any combination of ‘0’ and ‘1’ containing at least one 1 and one 0 where 1 appears first}

2. Find the RE for the following:
1. The set of language of all strings of 0 and 1 containing exactly two 0’s

[UPTU 2004]

2. The set of languages of any combination of ‘a’ and ‘b’ beginning with ‘a’.
3. The set of all strings of 0 and 1 that do not end with 11.
4. The set of languages of any combination of ‘0’ and ‘1’ containing at least one double symbol.
5. The set of all strings over {a, b} in which the number of occurrences of ‘a’ is divisible by 3.

[UPTU 2004]

6. The set of all strings where the 10th symbol from the right end is a 1.

[JNTU 2007]

7. The set of languages of any combination of ‘0’ and ‘1’ containing no double symbol.

Solution:

1. The language contains exactly two 0’s. But the language consists of 0 and 1. 1 may appear at first or in the middle or at the last. Thus, the language is L = 1*01*01*.
2. The set of any combination of ‘a’ and ‘b’ is denoted by (a+b)*. The RE is

L = a(a+b)*.

3. The strings may end with 00, 01, or 10. Thus, the language is L = (0+1)*(00 + 01+10).
4. The language consists of two symbols ‘0’ and ‘1’. A double (same) symbol means either 00 or 11. The part of the language containing at least one double symbol is denoted by (00 + 11). So, the language of any combination of ‘0’ and ‘1’ containing at least one double symbol is expressed as L = (0+1)* (00+11) (0+1)*.
5. The number of ‘a’ is divisible by 3 means that the number of ‘a’ may be 0, 3, 6, 9, 12, …… The number of ‘b’ may be 0, 1, 2, 3,………. The RE is (b*ab*ab*ab*)*.
6. The 10th symbol from the right hand side should be ‘a’, whereas the other symbols may be ‘a’ or ‘b’. A string of length n has (n−10) symbols from start and the last 9 symbols are of ‘a’ or ‘b’. The RE is (a + b)*a(a + b)10.
7. The language consists of two symbols ‘0’ and ‘1’. A double (same) symbol means either 00 or 11. According to the condition, in the language, 00 or 11 will not appear. The language may start with ‘0’ or start with ‘1’. The language may start with ‘0’ with no double symbol is (01)*. The language may start with ‘1’ with no double symbol is (10)*.

The expression is L = (01)* + (10)*.

3. Construct an RE from the given FA by the algebraic method using Arden’s theorem.

Solution: For the previously given FA, the equations are

The equation 1 is in the form R = Q + RP, where R = A, Q = Λ, and P = b. According to the Arden’s theorem, the solution for the equation is R = QP*.

So,

A = Λb* = b* (as ΛR = R Λ = R).

Putting the value of A in equation 2, we get

B = b*a + Bb.

The equation is in the format R = Q + RP, where R = B, Q = b*a, and P = b. According to the Arden’s theorem, the solution for the equation is R = QP*.

So,

B = b*ab*.

Putting the value of B in equation 3, we get

C = b*ab*a + C(a+b).

The equation is in the format R = Q + RP, where R = C, Q = b*ab*a, and P = (a + b). According to the Arden’s theorem, the solution for the equation is R = QP*.

So, C = b*ab*a(a+b)*

As C is the final state, the RE accepted by the FA is b*ab*a(a+b)*.

4. Construct an RE from the given FA by the algebraic method using Arden’s theorem.

Solution: For the previously given FA, the equations are

The equation 1 is in the form R = Q + RP, where R = A, Q = Λ, and P = a. According to the Arden’s theorem, the solution for the equation is R = QP*.

So,                          A = Λa* = a* (as ΛR = R Λ = R).

Putting the value of A in equation (2), we get

B = a*b.

Putting the value of B in equation (3), we get

C = a*bb+Cb.

This is in the format R = Q + RP, where R = C, Q = a*bb, and P = b. According to the Arden’s theorem, the solution for the equation is R = QP*.

C = a*bbb*.

Putting the value of B and C in equation (4), we get

D = a*ba + a*bbb*a + D(a+b).

This is in the format R = Q + RP, where R = D, Q = a*ba + a*bbb*a, and P = a+b. According to the Arden’s theorem, the solution for the equation is R = QP*.

D = (a*ba + a*bbb*a)(a+b)*.

In the FA, there are two final states B and D. So, the RE accepted by the FA is

L = a*b + (a*ba + a*bbb*a)(a+b)*.

5. Find the RE corresponding to the following FA.

Solution: For the Previously given FA, the equations are

Take equation number (2)

B = bA + bB.

The equation is in the form R = Q + RP, where R = B, Q = bA, and P = b.

The solution of the equation is B = bAb*.

Replacing the value of B in equation (1), we get

A = abAb* + aA + Λ + Λ + A (abb* + a). The equation is in the form R = Q + RP where R = A, Q = Λ, and P = (abb* + a). The solution of the equation is A = Λ(abb* + a)* = (abb* + a)*.

Replacing this value in the solution of B, we get b(abb* + a)*b*. As B is the final state, the string accepted by the FA is b(abb* + a)*b*.

6. Find out the RE of the given diagram.

[UPTU 2003]

For the previously given FA, the equations are

Replacing the value of q3 in (2), we get

This is in the form R = Q + RP. Hence, the solution is

q2 = q1b(b + aa)*.

Replacing the value of q2 in (i), we get

This is in the form R = Q + RP. Hence, the solution is

q3 is the final state and, therefore, the RE accepted by the FA is (a + b(b + aa)*b)* b(b + aa)* b.

7. Find the RE corresponding to the following figure:

[WBUT 2007]

Solution: There is only one initial state. In the FA, there are no null moves.

The equations for q1, q2, q3, and q4 are

Now, we need to solve the equations by using the Arden’s theorem to get the RE.

is in the form R = Q + RP, where R = q2, Q = q1.1, P = (1 + 011).

The solution of the equation is

q2 = q1.1 (1 + 011)*.

From here,

The equation is in the form R = Q + RP, where R = q1, Q = Λ, and P = 1 (1 + 011)* (00 + 010).

By applying Arden’s theorem, the solution of the equation is R = QP*.

q1 = (0 + 1 (1 + 011)* (00 + 010))*

The final state of the automaton is q4 = (0 + 1 (1 + 011)* (00 + 010))*(1 (1 + 011)*01).

The RE corresponding to the given automaton is

(0 + 1 (1 + 011)* (00 + 010))*(1 (1 + 011)*01).

8. Construct an FA equivalent to the RE, L = (a + b) + a(a + b)* (ab + ba).

Solution:

Step I: Take a beginning state q0 and a final state qf. Between the beginning and final states, place the RE.

Step II: Between (a+b) and a(a+b)* (ab+ba), there is a + sign, and so there will be parallel edges between q0 and qf.

Step III: Between (a + b)* and (ab + ba), there is a .(dot) sign, and so an extra state is added between q0 and qf.

Step IV: Between ab and ba, there is + sign, and so there will be parallel edges between q1 and qf.

Step V: Between ‘a’ and b, there is a + sign. So, between q0 and qf there is a parallel edge.

Step VI: Between ‘a’ and ‘b’ and between ‘b’ and ‘a’, there are .(dots). So, two extra states are added between q1 and qf. An extra state is added between q0 and q1 for a(a + b)*.

Step VII: The * between q4 and q1 is removed by adding an extra state with label a, b, and the ∈ transition from q4 to that state and from that state to q1.

Step VIII: Removing ∈, the automata become

9. Construct an FA equivalent to the RE, L = (00 + 11)* 11 (0 + 1)*0.

Solution:

Step I: Take a beginning state q0 and a final state qf. Between the beginning and final state, place the RE.

Step II: There are four .(dots) in between (00 + 11)* and 1, 1 and 1, 1 and (0 + 1)*, and (0 + 1)* and 0. So, the four extra states are added in between q0 and qf.

Step III: The * between q0 and q1 is removed by adding an extra state with label 00 and 11 as loop and ∈ transition from q0 to that state and from that state to q1. The same is applied for the removal of * between q3 and q4.

Step IV: Removing the + sign between 00 and 11, parallel edges are added and for two .(dots) signs (between 0, 0 and 1,1), two extra states are added.

Step V: Use the ∈ removal technique to find the corresponding DFA.

10. Construct an FA for the RE 10 + (0 + 11)0*1.

[UPTU 2004]

Solution:

11. Convert an RE (0 + 1)*(10) + (00)*(11)* to an NFA with ∈ move.

[Gujrat Technological University 2010]

12. Construct an FA equivalent to the regular expression

(01(1(10)* + 111)* + 0)*1.

[Cochin University 2009]

13. Convert the following NFA with ∈-move to an equivalent DFA.

Solution:

As 0 is the beginning state, start with ∈-closure(0) = {0, 1, 2, 4, 7}. Let us rename it as A.

A is still unmarked.

Then, construct a δ′ function for the new unmarked state A for inputs a and b.

This is a new state, so rename it as B.

This is a new state, so rename it as C.

B is still unmarked.

Then, construct a δ′ function for the new unmarked state B for inputs a and b.

This is a new state, so rename it as D.

C is still unmarked.

Then, construct a δ′ function for the new unmarked state C for inputs a and b.

D is still unmarked.

Then, construct a δ′ function for the new unmarked state D for inputs a and b.

This is a new state, so rename it as E.

Here, the final state is E.

The equivalent DFA is

14. Construct a regular grammar for the RE, L = (a + b)*(aa + bb)(a + b)*.

Solution: The NFA for the RE is

There are four states in the FA. So, in the regular grammar, there are four non-terminals.

Let us take them as A (for q0), B (for q1), C (for q2), and D (for q3).

Now, we have to construct the production rules of the grammar.

For the state q0, the production rules are

A → aA, A → bA, A → aB, A → bC.

For the state q1, the production rules are

B → aD, B → a (as D is the final state).

For the state q3, the production rules are

C → b D, C → b (as D is the final state).

For the state q4, the production rules are

D → aD, D → bD, D → a/ b.

The grammar = {VN, Σ, P, S}

15. Consider the given FA and construct the smallest DFA which accepts the same language. Draw an RE and the grammar that generates it.

[UPTU 2002]

Solution: This problem can be solved by the ∈-closure method.

As 1 is the beginning state, start with ∈-closure(1) = {1,4}. Let us rename it as A.

A is still unmarked.

Then, construct a δ′ function for the new unmarked state A for inputs 0 and 1.

It is a new state. Mark it as B.

It is a new state. Mark it as D.

It is a new state. Mark it as E.

It is a new state. Mark it as C.

The beginning state is A, and the final state is E.

The transitional diagram of the DFA is

The RE is constructed using the Arden’s theorem.

Replacing A in B and C, we get

Replacing C in B, we get B = 0 + 01.

Replacing the new value of B, C, and E in D, we get

It is in the format of R = Q + RP, where Q = (0 + (0 + 01) + 11), P = (1 + 10).

The solution is R = QP*.

D = (0 + (0 + 01) + 11) (1 + 10)*.

Replacing the value of D in E, we get E = 0(0 + (0 + 01) + 11) (1 + 10)*.

As E is the final state, the RE accepted by the FA is

0(0 + (0 + 01) + 11) (1 + 10)*.

The regular grammar is constructed as follows taking each state into account.

There are five states in the FA. So, in the regular grammar, there are five non-terminals.

Let us take them as A, B, C, D, and E according to the name of the states.

Now we have to construct the production rules of the grammar.

For the state A, the production rules are

A → 0B, A → 1C.

For the state B, the production rules are

B → 0D.

For the state C, the production rules are

C → 0B C → 1D.

For the state D, the production rules are

D → 1D D → 0E D → 0.

For the state E, the production rules are

E → 1D.

16. Find the RE recognized by the finite state automaton of the following figure.

[GATE 1994]

Solution: The equation for the FA is

Solving the equation (1) using the Arden’s theorem, we get A = ∧0* = 0*.

Putting the value of A in equation (2), we get

B = 10* + 1B.

Using the Arden’s theorem, we get

B = 10*1*.

Both A and B are final states, and thus the string accepted by the FA is

#### Multiple Choice Questions

1. The machine format of regular expression is
1. Finite automata
2. Push down automata
3. Turing machine
4. All of the above
2. The regular expression is accepted by
1. Finite automata
2. Push down automata
3. Turing machine
4. All of the above
3. The language of all words with at least 2 a’s can be described by the regular expression
1. (ab)*a
2. (a + b)*ab*a(a + b)*
3. b*ab*a(a + b)*
4. All of the above
4. The set of all strings of {0, 1} having exactly two 0’s is
1. 1*01*01*
2. {0 + 1)*1
3. {11 + 0}*
4. {00 + 11}*
5. The set of all strings of {0, 1} where 0 is followed by 1 and 1 is followed by 0 is
1. (01)*
2. 0*1*
3. 0*1 + 1*0
4. 0*1* + 1*0*
6. Which of the strings do not belong to the regular expression (ba + baa)*aaba
1. baaaba
2. babaabaaaba
3. babababa
4. baaaaba
7. Which of the following type of language is regular expression?
1. Type 0
2. Type 1
3. Type 2
4. Type 3
8. Which of the following RE over {0,1} denotes the set of all strings not containing 100 as substring?
1. (1 + 0)*0*
2. 0*1010*
3. 0*1*01
4. All of the above
9. ∧ + RR* = ?
1. R
2. R*
3. R +
10. {a/b} denotes the set
1. {ab}
2. {a, b}
3. {a, b}*
4. {ab}*
11. What is the solution for the equation R = Q + RP (If P and Q are RE and P does not contain ∧)?
1. R = QP*
2. R = QP
3. R = PQ*
4. R = P*Q*
12. Which of the languages is accepted by the following FA?
1. b(a + bba*)*a* b)
2. a(a + bba*)*b*
3. a(a + bb*a*)*b*d)
4. None of these
13. Which is true for ∈-closure of a state?
1. All the states reachable from the state with input null excluding the state
2. The state only
3. All the other states
4. All the states reachable from the state with input null
14. The pumping lemma for regular expression is used to prove that
1. Certain sets are regular
2. Certain sets are not regular
3. Certain regular grammar produce RE
4. Certain regular grammar does not produce RE
15. Regular sets are closed under
1. Union
2. Concatenation
3. Kleene closure
4. All of the above

1. a
2. d
3. b
4. a
5. d
6. c
7. c
8. c
9. b
10. b
11. a
12. a
13. d
14. a
15. d

#### GATE Questions

1. Which two of the following four regular expressions are equivalent?
1. (00)*(∈ + 0)
2. (00)*
3. 0*0*
4. 0(00)*
1. (i) and (ii)
2. (i) and (iv)
3. (ii) and (iii)
4. (iii) and (iv)
2. Which of the following alternatives is true for the following three regular expressions?
1. (011((11)* + (01)*)*)*011
2. 011(((1 + 0)1)*011)*
3. 011(((1 + 0)*1*)*011)*
1. (i) and (iii) are equivalent
2. (ii) and (iii) are equivalent
3. (i) and (ii) are equivalent
4. No pairs are equivalent
3. Consider the following statements:
1. Ø* = ∈
2. s(rs + s)*r = (sr + s)*sr
3. (r + s)* = (r* + s*) +

Find out the correct alternatives.

1. All are true
2. Only (ii) is false
3. only (iii) is false
4. Both (ii) and (iii) are false
4. Which regular expression best describes the language accepted by the following non-deterministic automation?
1. (a + b)* a(a + b)b
2. (abb)*
3. (a + b)*a(a + b)*b(a + b)*
4. (a + b)*
5. Which of the following regular expressions describe the language over {0, 1} consisting of strings that contain exactly two 1’s?
1. (0 + 1)*11(0 + 1)*
2. 0*110*
3. 0*10*10*
4. (0 + 1)*1(0 + 1)*1(0 + 1)*
6. Find the false statement if S = (a, b)
1. L = {anbm, m, n ≥ 1} is regular
2. L = {X, where no of a > no of b} is not regular
3. L = {anbn, n ≥ 1} is regular
4. L = {X, where no of a and no of b are equal} is not regular
7. Which of the following regular expression identified is true?
1. r(*) = r*
2. (r*s*) = (r + s)*
3. (r + s)* = r* + s*
4. r*s* = r* + s*
8. In some programming languages, an identifier is permitted to be a letter followed by any number of letters or digits. If L and D denote the sets of letters and digits, respectively, which of the following expressions define an identifier?
1. (L ∪ D)*
2. L(L ∪ D)*
3. (L.D)*
4. L(L.D)*
9. Which two of the following four regular expressions are equivalent?
1. (00)*(∈ + 0)
2. (00)*
3. 0*
4. 0(00)*
1. (i) and (ii)
2. (ii) and (iii)
3. (i) and (iii)
4. (iii) and (iv)
10. Let L ⊆ Σ*, where Σ = {a, b}. Which of the following is true?
1. L = {x | x has equal number of a’s and b’s} is regular
2. L = {anbn | n ≥ 1} is regular
3. L = {x | x has more a’s than b’s} is regular
4. L = {ambn | m ≥ 1, n ≥ 1} is regular
11. The string 1101 does not belong to the set represented by
1. 110*(0 + 1)
2. 1(0 + 1)*101
3. (10)*(01)*(00 + 11)*
4. (00 + (11)*01)*
12. Give a regular expression for the set of binary strings where every 0 is immediately followed by exactly k number of 1’s and preceded by at least k number of 1’s (k is a fixed integer).
1. 1*1k(01k)*1*
2. 1*1k01k
3. 1*(1k01k)*
4. 1*(1k0)* 1*
13. Let S and T be the languages over Σ = {a, b} represented by the regular expressions (a + b*)* and (a + b)*, respectively. Which of the following is true?
1. S ⊂ T
2. T ⊂ S
3. S = T
4. S ∩ T = Ø
14. The regular expression 0*(10*)* denotes the same set as
1. (1*0)*1*
2. 0 + (0 + 10)*
3. (0 + 1)*10(0 + 1)*
4. None of the above
15. Consider the following NFA M

Let the language accepted by M be L. Let L1 be the language accepted by the NFA M1 obtained by changing the accepting states of M to a non-accepting state and by changing the non-accepting states of M to accepting states. Which of the following statements is true?

1. L1 = {0, 1}* − L
2. L1 = {0, 1}*
3. L1 ⊆ L
4. L1 = L
16. Consider the regular language L = (111 + 11111)*. The minimum number of states in any DFA accepting this language is
1. 3
2. 5
3. 8
4. 9
17. Which of the following is true?
1. Every subset of a regular set is regular.
2. Every finite subset of a non-regular set is regular.
3. The union of two non-regular sets is not regular.
4. Infinite union of finite sets is regular.
18. The language accepted by this automaton is given by the regular expression
1. b*ab*ab*ab*
2. (a + b)*
3. b*a(a + b)*
4. b*ab*ab*
19. The minimum state automation equivalent to the previous FSA has the following number of states.
1. 1
2. 2
3. 3
4. 4
20. Which of the following is true for the language {ap | p is prime}?
1. It is not accepted by a Turing machine
2. It is regular but not context-free
3. It is context-free but not regular
4. It is neither regular nor context-free but accepted by a Turing machine.
21. Match the following NFAs with the regular expressions they correspond to.
1. ∈ + 0(01*1 + 00)*01*
2. ∈ + 0(10*1 + 00)*0
3. ∈ + 0(10*1 + 10)*1
4. ∈ + 0(10*1 + 10)*10*
1. P −2, Q −1, R −3, S −4
2. P −1, Q −3, R −2, S −4
3. P −1, Q −2, R −3, S −4
4. P −3, Q −2, R −1, S −4
22. Which of the following are regular sets?
1. {anb2m | n ≥ 0, m ≥ 0}
2. {anbm | n = 2m}
3. {anbm | n ≠ m}
4. {xcy | x, y ∈ {a, b}*}
1. I and IV only
2. I and III only
3. I only
4. IV only
23. Which one of the following languages over the alphabet {0, 1} is described by the regular expression:

(0 + 1)*0(0 + l)*0(0 + 1)* ?

1. The set of all strings containing the substring 00.
2. The set of all strings containing at most two 0’s.
3. The set of all strings containing at least two 0’s.
4. The set of all strings that begin and end with either 0 or 1
24. Let L = {w∈(0 + 1)* | w has an even number of 1s}, i.e., L is the set of all bit strings with even number of 1s. Which one of the given regular expressions represent L?
1. (0*10*1)*
2. 0*(10*10*)*
3. 0*(10*1*)*0*
4. 0*1(10*1)*10*
25. What is the complement of the language accepted by the given NFA? Assume that Σ = {a} and ∈ is the empty string.
1. {∈}
2. a*
3. {a, ∈}
26. Given the language L = {ab, aa, baa}, which of the following strings are in L*?
1. abaabaaabaa
2. aaaabaaaa
3. baaaaabaaaab
4. baaaaabaa
1. 1, 2, and 3
2. 2, 3, and 4
3. 1, 2, and 4
4. 1, 3, and 4
27. Which one of the following regular expressions is not equivalent to the regular expression (a + b + c)*?
1. (a* + b* + c*)*
2. (a*b*c*)*
3. ((ab)* + c*)*
4. (a*b* + c*)
28. Let M = (K, Σ, δ,s, F) be a finite state automaton, where K = {A, B}, = {a, b}, s = A, F = {B}, δ(A, a) = A, δ(A, b) = B, δ(B, a) = B, and δ(B, b) = A.

A grammar to generate the language accepted by M can be specified as G = (V, Σ, R, S), where V = K ∪ Σ and S = A.

Which one of the following set of rules will make L (G) = L(M)?

1. {A → aB, A → bA, B → bA, B → aA, B → ∈}
2. {A → aA, A → bB, B → bb, B → bA, B → ∈}
3. {A → bB, A → aB, B → aA, B → bA, B → ∈}
4. {A → aA, A → bA, B → aB, B → bA, A → ∈}

1. b
2. c
3. d
4. a
5. c
6. c
7. a
8. b
9. c
10. d
11. c
12. a
13. c
14. a
15. b
16. b
17. c
18. c
19. b
20. d
21. c
22. c
23. c
24. a,b,d
25. b
26. c
27. c
28. b

Hints:

2.  See example 5.4.

11.  Both generates odd and even length strings

14.  (a + b*)* = (a* (b*)*)* = (a*b*)*

15.  Both of these generate any combination of 0 and 1

16.  Construct the transitional diagram.

17.

18.  a) (0 + 1)* is regular but 0n1n is not regular.

20.  The simplified automata is

21.  It is not regular (proved by the pumping lemma). Not CFG. But accepted by TM as TM accepts unrestricted language.

25.  Only c may generate odd number of 1.

26.  The language accepted by the FA is a+. The complement of it is Σ* − a+ = ∈.

28.  C will not generate ac or bc but all the others will generate.

#### Fill in the Blanks

1. Regular expression is a string on ______________ among {Q, Σ, δ, q0, F}
2. If R is a regular expression, then Λ + RR* = ______________.
3. According to the Arden’s theorem, the solution of the equation R = Q + RP is ______________.
4. The set of all states that can be reached from that state to all the states with input ∈ is called ______________.
5. For the following NFA with ∈-move

∈-closure of q1 is ____________.

6. A grammar where all productions are in the form A → Bα or A → α is called ______________.
7. A grammar where all productions are in the form A → αB or A → α is called ______________.
8. The pumping lemma for regular expression is used to prove that certain sets are not ______________.
9. The property which describes when we combine any two elements of the set and the result is also included in the set is called ______________.
10. If any finite automata contains any ∈ (null) move or transaction, then that finite automata is called ______________.
11. The machine format of regular expression is ______________.

1. Σ
2. R*
3. R = QP*
4. ∈-closure
5. q0, q1, q2
6. left linear grammar
7. right linear grammar
8. regular
9. closure property
10. NFA with Σ moves
11. Finite automata

#### Exercise

1. Prove that (0 + 011*) = (0 + 011*)(01 + 0100*)(01 + 0100*)* = 01*(010*)*
2. Find the regular expression for the following.
1. Set of all strings over (a, b) containing exactly one a
2. Set of all strings over (a, b) containing at least two a’s
3. Set of all strings over (0, 1) containing exactly two a’s
4. Set of all strings over (a, b) beginning and ending with b
5. Set of all strings over (0, 1) containing 010 as substring
6. Set of all strings over (0, 1) not containing substring 00
3. Describe the following regular expressions in English language
1. b*ab*
2. (a + b)* aba(a + b)*
3. a(a + b)*ab
4. Find the regular expressions corresponding to the automaton generated by the following finite automata.
5. Convert the following NFA with null move to an equivalent DFA by the ∈-closure method.
6. Develop the finite automata for the following regular expressions
1. 10 + (0 + 11)*01
2. (a*ab + ba)*a
3. a(ab + a)*(aa + b)
7. Check whether the following two finite automata are equivalent or not.
8. Check whether the following two RE are equivalent or not

L1 = 10 + (1010)*(1010)* L2 = 10 + (1010)*

9. Construct the regular grammar for the following finite automata.
1. ab + (aa + bb)(a + b)*
2. 01(00 + 11)*11*
3. a*b(a + b)*
4. a*(aa + bb)* + b*(ab + ba)*
10. Using the pumping lemma show that the following sets are not regular.
1. L = {anbm, where 0 < n < m}
2. L = {anbn + 1, where n > 0}
3. L = {aibjck, where i, j, k > 0}