5.3* Matrix Limits and Markov Chains – Linear Algebra, 5th Edition

5.3* Matrix Limits and Markov Chains

In this section, we apply what we have learned thus far in Chapter 5 to study the limit of a sequence of powers A, A2, , An, ,  where A is a square matrix with complex entries. Such sequences and their limits have practical applications in the natural and social sciences.

We assume familiarity with limits of sequences of real numbers. The limit of a sequence of complex numbers {zm: m=1, 2, } can be defined in terms of the limits of the sequences of the real and imaginary parts: If zm=rm+ism, where rm and sm are real numbers, and i is the imaginary number such that i2=1, then

limmzm=limmrm+ilimmsm, 

provided that limmrm and limmsm exist.

Definition.

Let L, A1, A2,  be n×p matrices having complex entries. The sequence A1, A2,  is said to converge to the n×p matrix L, called the limit of the sequence, if

limm(Am)ij=Lij

for all 1in and 1jp. To denote that L is the limit of the sequence, we write

limmAm=L.

Example 1

If

Am=(11m (34)m3m2m2+1+i(2m+1m1)(i2)m2(1+1m)m), 

then

limmAm=(103+2i02e), 

where e is the base of the natural logarithm.

A simple, but important, property of matrix limits is contained in the next theorem. Note the analogy with the familiar property of limits of sequences of real numbers that asserts that if limmam exists, then

limmcam=c(limmam).

Theorem 5.11.

Let A1, A2,  be a sequence of n×p matrices with complex entries that converges to the matrix L. Then for any PMr×n(C) and QMp×s(C), 

limmPAm=PLandlimmAmQ=LQ.

Proof.

For any i(1ir) and j(1jp), 

limm(PAm)ij=limmk=1nPik(Am)kj=k=1nPiklimm(Am)kj=k=1nPikLkj=(PL)ij.

Hence limmPAm=PL. The proof that limmAmQ=LQ is similar.

Corollary.

Let AMn×n(C) be such that limmAm=L. Then for any invertible matrix QMn×n(C), 

limm(QAQ1)m=QLQ1.

Proof.

Since

(QAQ1)m=(QAQ1)(QAQ1)(QAQ1)=QAmQ1, 

we have

limm(QAQ1)m=limmQAmQ1=Q(limmAm)Q1=QLQ1

by applying Theorem 5.11 twice.

In the discussion that follows, we frequently encounter the set

S={λC: |λ|<1 or λ=1}.

Geometrically, this set consists of the complex number 1 and the interior of the unit disk (the disk of radius 1 centered at the origin). This set is of interest because if λ is a complex number, then limmλm exists if and only λS. This fact, which is obviously true if λ is real, can be shown to be true for complex numbers also.

The following important result gives necessary and sufficient conditions for the existence of the type of limit under consideration.

Theorem 5.12.

Let A be a square matrix with complex entries. Then limmAm exists if and only if both of the following conditions hold.

  1. (a) Every eigenvalue of A is contained in S.

  2. (b) If 1 is an eigenvalue of A, then the dimension of the eigenspace corresponding to 1 equals the multiplicity of 1 as an eigenvalue of A.

One proof of this theorem, which relies on the theory of Jordan canonical forms (Section 7.2), can be found in Exercise 19 of Section 7.2. A second proof, which makes use of Schur’s theorem (Theorem 6.14 of Section 6.4), can be found in the article by S. H. Friedberg and A. J. Insel, “Convergence of matrix powers,” Int. J. Math. Educ. Sci. Technol., 1992, Vol. 23, no. 5, pp. 765-769.

The necessity of condition (a) is easily justified. For suppose that λ is an eigenvalue of A such that λS. Let v be an eigenvector of A corresponding to λ. Regarding v as an n×1 matrix, we see that

limm(Amv)=(limmAm)v=Lv

by Theorem 5.11, where L=limmAm. But limm(Amv)=limm(λmv) diverges because limmλm does not exist. Hence if limmAm exists, then condition (a) of Theorem 5.12 must hold.

Although we are unable to prove the necessity of condition (b) here, we consider an example for which this condition fails. Observe that the characteristic polynomial for the matrix

B=(1101)

is (t1)2, and hence B has eigenvalue λ=1 with multiplicity 2. It can easily be verified that dim(Eλ)=1, so that condition (b) of Theorem 5.12 is violated. A simple mathematical induction argument can be used to show that

Bm=(1m01), 

and therefore that limmBm does not exist. We see in Chapter 7 that if A is a matrix for which condition (b) fails, then A is similar to a matrix whose upper left 2×2 submatrix is precisely this matrix B.

In most of the applications involving matrix limits, the matrix is diagonalizable, and so condition (b) of Theorem 5.12 is automatically satisfied. In this case, Theorem 5.12 reduces to the following theorem, which can be proved using our previous results.

Theorem 5.13.

Let AMn×n(C) satisfy the following two conditions.

  1. (a) Every eigenvalue of A is contained in S.

  2. (b) A is diagonalizable.

Then limmAm exists.

Proof.

Since A is diagonalizable, there exists an invertible matrix Q such that Q1AQ=D is a diagonal matrix. Suppose that

D=(λ1000λ2000λn).

Because λ1, λ2, , λn are the eigenvalues of A, condition (i) requires that for each i, either λi=1 or |λi|<1. Thus

limmλim={1if λi=10otherwise.

But since

Dm=(λ1m000λ2m000λnm), 

the sequence D, D2,  converges to a limit L. Hence

limmAm=limm(QDQ1)m=QLQ1

by the corollary to Theorem 5.11.

The technique for computing limmAm used in the proof of Theorem 5.13 can be employed in actual computations, as we now illustrate. Let

A=(74941543474343494114).

Using the methods in Sections 5.1 and 5.2, we obtain

Q=(131321231)andD=(10001200014)

such that Q1AQ=D. Hence

limmAm=limm(QDQ1)m=limmQDmQ1=Q(limmDm)Q1=(131321231)[limm(1000(12)m000(14)m)](101112537)=(131321231)(100000000)(101112537)=(101303202).

Next, we consider an application that uses the limit of powers of a matrix. Suppose that the population of a certain metropolitan area remains constant but there is a continual movement of people between the city and the suburbs. Specifically, let the entries of the following matrix A represent the probabilities that someone living in the city or in the suburbs on January 1 will be living in each region on January 1 of the next year.

Currentlyliving inthe cityCurrentlyliving inthe suburbsLiving next year in the cityLiving next year in the suburbs(0.900.020.100.98)=A

For instance, the probability that someone living in the city (on January 1) will be living in the suburbs next year (on January 1) is 0.10. Notice that since the entries of A are probabilities, they are nonnegative. Moreover, the assumption of a constant population in the metropolitan area requires that the sum of the entries of each column of A be 1.

Any square matrix having these two properties (nonnegative entries and columns that sum to 1) is called a transition matrix or a stochastic matrix. For an arbitrary n×n transition matrix M, the rows and columns correspond to n states, and the entry Mij represents the probability of moving from state j to state i in one stage.

In our example, there are two states (residing in the city and residing in the suburbs). So, for example, A21 is the probability of moving from the city to the suburbs in one stage, that is, in one year. We now determine the probability that a city resident will be living in the suburbs after 2 years. There are two different ways in which such a move can be made: remaining in the city for 1 year and then moving to the suburbs, or moving to the suburbs during the first year and remaining there the second year. (See Figure 5.3.) The probability that a city dweller remains in the city for the first year is 0.90, whereas the probability that the city dweller moves to the suburbs during the first year is 0.10. Hence the probability that a city dweller stays in the city for the first year and then moves to the suburbs during the second year is the product (0.90)(0.10). Likewise, the probability that a city dweller moves to the suburbs in the first year and remains in the suburbs during the second year is the product (0.10)(0.98). Thus the probability that a city dweller will be living in the suburbs after 2 years is the sum of these products, (0.90)(0.10)+(0.10)(0.98)=0.188. Observe that this number is obtained by the same calculation as that which produces (A2)21, and hence (A2)21 represents the probability that a city dweller will be living in the suburbs after 2 years. In general, for any transition matrix M, the entry (Mm)ij represents the probability of moving from state j to state i in m stages.

Figure 5.3

Suppose additionally that in year 2000, 70% of the population of the metropolitan area lived in the city and 30% lived in the suburbs. We record these data as a column vector:

Proportion of city dwellersProportion of suburbs residents(0.700.30)=P.

Notice that the rows of P correspond to the states of residing in the city and residing in the suburbs, respectively, and that these states are listed in the same order as the listing in the transition matrix A. Observe also that the column vector P contains nonnegative entries that sum to 1; such a vector is called a probability vector. In this terminology, each column of a transition matrix is a probability vector. It is often convenient to regard the entries of a transition matrix or a probability vector as proportions or percentages instead of probabilities, as we have already done with the probability vector P.

In the vector AP, the first coordinate is the sum (0.90)(0.70)+(0.02)(0.30). The first term of this sum, (0.90)(0.70), represents the proportion of the 2000 metropolitan population that remained in the city during the next year, and the second term, (0.02) (0.30), represents the proportion of the 2000 metropolitan population that moved into the city during the next year. Hence the first coordinate of AP represents the proportion of the metropolitan population that was living in the city in 2001. Similarly, the second coordinate of

AP=(0.6360.364)

represents the proportion of the metropolitan population that was living in the suburbs in 2001. This argument can be easily extended to show that the coordinates of

A2P=A(AP)=(0.579680.42032)

represent the proportions of the metropolitan population that were living in each location in 2002. In general, the coordinates of AmP represent the proportion of the metropolitan population that will be living in the city and suburbs, respectively, after m stages (m years after 2000).

Will the city eventually be depleted if this trend continues? In view of the preceding discussion, it is natural to define the eventual proportion of the city dwellers and suburbanites to be the first and second coordinates, respectively, of limmAmP. We now compute this limit. It is easily shown that A is diagonalizable, and so there is an invertible matrix Q and a diagonal matrix D such that Q1AQ=D. In fact,

Q=(16165616)andD=(1000.88).

Therefore

L=limmAm=limmQDmQ1=Q(1000)Q1=(16165656).

Consequently

limmAmP=LP=(1656).

Thus, eventually, 16 of the population will live in the city and 56 will live in the suburbs each year. Note that the vector LP satisfies A(LP)=LP. Hence LP is both a probability vector and an eigenvector of A corresponding to the eigenvalue 1. Since the eigenspace of A corresponding to the eigenvalue 1 is one-dimensional, there is only one such vector, and LP is independent of the initial choice of probability vector P. (see Exercise 15.) For example, had the 2000 metropolitan population consisted entirely of city dwellers, the limiting outcome would be the same.

In analyzing the city–suburb problem, we gave probabilistic interpretations of A2 and AP, showing that A2 is a transition matrix and AP is a probability vector. In fact, the product of any two transition matrices is a transition matrix, and the product of any transition matrix and probability vector is a probability vector. A proof of these facts is a simple corollary of the next theorem, which characterizes transition matrices and probability vectors.

Theorem 5.14.

Let M be an n×n matrix having real nonnegative entries, let v be a column vector in Rn having nonnegative coordinates, and let uRn be the column vector in which each coordinate equals 1. Then

  1. (a) M is a transition matrix if and only if utM=ut;

  2. (b) v is a probability vector if and only if utv=(1).

Proof.

Exercise.

Corollary.

  1. (a) The product of two n×n transition matrices is an n×n transition matrix. In particular, any power of a transition matrix is a transition matrix.

  2. (b) The product of a transition matrix and a probability vector is a probability vector.

Proof.

Exercise.

The city–suburb problem is an example of a process in which elements of a set are each classified as being in one of several fixed states that can switch over time. In general, such a process is called a stochastic process. The switching to a particular state is described by a probability, and in general this probability depends on such factors as the state in question, the time in question, some or all of the previous states in which the object has been (including the current state), and the states that other objects are in or have been in.

For instance, the object could be an American voter, and the state of the object could be his or her preference of political party; or the object could be a molecule of H2O, and the states could be the three physical states in which H2O can exist (solid, liquid, and gas). In these examples, all four of the factors mentioned above influence the probability that an object is in a particular state at a particular time.

If, however, the probability that an object in one state changes to a different state in a fixed interval of time depends only on the two states (and not on the time, earlier states, or other factors), then the stochastic process is called a Markov process. If, in addition, the number of possible states is finite, then the Markov process is called a Markov chain. We treated the city–suburb example as a two-state Markov chain. Of course, a Markov process is usually only an idealization of reality because the probabilities involved are almost never constant over time.

With this in mind, we consider another Markov chain. A certain community college would like to obtain information about the likelihood that students in various categories will graduate. The school classifies a student as a sophomore or a freshman depending on the number of credits that the student has earned. Data from the school indicate that, from one fall semester to the next, 40% of the sophomores will graduate, 30% will remain sophomores, and 30% will quit permanently. For freshmen, the data show that 10% will graduate by next fall, 50% will become sophomores, 20% will remain freshmen, and 20% will quit permanently. During the present year, 50% of the students at the school are sophomores and 50% are freshmen. Assuming that the trend indicated by the data continues indefinitely, the school would like to know

  1. the percentage of the present students who will graduate, the percentage who will be sophomores, the percentage who will be freshmen, and the percentage who will quit school permanently by next fall;

  2. the same percentages as in item 1 for the fall semester two years hence; and

  3. the percentage of its present students who will eventually graduate.

The preceding paragraph describes a four-state Markov chain with the following states:

  1. having graduated

  2. being a sophomore

  3. being a freshman

  4. having quit permanently.

The given data provide us with the transition matrix

A=(10.40.1000.30.50000.2000.30.21)

of the Markov chain. (Notice that students who have graduated or have quit permanently are assumed to remain indefinitely in those respective states. Thus a freshman who quits the school and returns during a later semester is not regarded as having changed states—the student is assumed to have remained in the state of being a freshman during the time he or she was not enrolled.) Moreover, we are told that the present distribution of students is half in each of states 2 and 3 and none in states 1 and 4. The vector

P=(00.50.50)

that describes the initial probability of being in each state is called the initial probability vector for the Markov chain.

To answer question 1, we must determine the probabilities that a present student will be in each state by next fall. As we have seen, these probabilities are the coordinates of the vector

AP=(10.40.1000.30.50000.2000.30.21)(00.50.50)=(0.250.400.100.25).

Hence by next fall, 25% of the present students will graduate, 40% will be sophomores, 10% will be freshmen, and 25% will quit the school permanently. Similarly,

A2P=A(AP)=(10.40.1000.30.50000.2000.30.21)(0.250.400.100.25)=(0.420.170.020.39)

provides the information needed to answer question 2: within two years 42% of the present students will graduate, 17% will be sophomores, 2% will be freshmen, and 39% will quit school.

Finally, the answer to question 3 is provided by the vector LP, where L=limmAm. For the matrices

Q=(1419007400008003131)andD=(100000.300000.200001), 

we have Q1AQ=D. Thus

L=limmAm=Q(limmDm)Q1=(1419007400008003131)(1000000000000001)(147275600175700018003729561)=(147275600000000003729561).

So

LP=(147275600000000003729561)(00.50.50)=(591120053112), 

and hence the probability that one of the present students will graduate is 59112.

In the preceding two examples, we saw that limmAmP, where A is the transition matrix and P is the initial probability vector of the Markov chain, gives the eventual proportions in each state. In general, however, the limit of powers of a transition matrix need not exist. For example, if

M=(0110), 

then limmMm does not exist because odd powers of M equal M and even powers of M equal I. The reason that the limit fails to exist is that condition (a) of Theorem 5.12 does not hold for M (1 is an eigenvalue). In fact, it can be shown (see Exercise 20 of Section 7.2) that the only transition matrices A such that limmAm does not exist are precisely those matrices for which condition (a) of Theorem 5.12 fails to hold.

But even if the limit of powers of the transition matrix exists, the computation of the limit may be quite difficult. (The reader is encouraged to work Exercise 6 to appreciate the truth of the last sentence.) Fortunately, there is a large and important class of transition matrices for which this limit exists and is easily computed—this is the class of regular transition matrices.

Definition.

A transition matrix is called regular if some power of the matrix contains only nonzero (i.e., positive) entries.

Example 2

The transition matrix

(0.900.020.100.98)

of the Markov chain used in the city–suburb problem is clearly regular because each entry is positive. On the other hand, the transition matrix

A=(10.40.1000.30.50000.2000.30.21)

of the Markov chain describing community college enrollments is not regular because the first column of Am is

(1000)

for any power m.

Observe that a regular transition matrix may contain zero entries. For example,

M=(0.90.5000.50.40.100.6)

is regular because every entry of M2 is positive.

The remainder of this section is devoted to proving that, for a regular transition matrix A, the limit of the sequence of powers of A exists and has identical columns. From this fact, it is easy to compute this limit. In the course of proving this result, we obtain some interesting bounds for the magnitudes of eigenvalues of any square matrix. These bounds are given in terms of the sum of the absolute values of the rows and columns of the matrix. The necessary terminology is introduced in the definitions that follow.

Definitions.

Let AMn×n(C). For 1i, jn, define ρi(A) to be the sum of the absolute values of the entries of row i of A, and define vj(A) to be equal to the sum of the absolute values of the entries of column j of A. Thus

ρi(A)=j=1n|Aij|for i=1, 2, , n

and

vj(A)=i=1n|Aij|for j=1, 2, , n.

The row sum of A, denoted ρ(A), and the column sum of A, denoted v(A), are defined as

ρ(A)=max{ρi(A): 1in}andv(A)=max{vj(A): 1jn}.

Example 3

For the matrix

A=(1i34i2+i0632i), 

ρ1(A)=7, ρ2(A)=6+5, ρ3(A)=6, v1(A)=4+5, v2(A)=3, and v3(A)=12. Hence ρ(A)=6+5 and v(A)=12.

Our next results show that the smaller of ρ(A) and v(A) is an upper bound for the absolute values of eigenvalues of A. In the preceding example, for instance, A has no eigenvalue with absolute value greater than 6+5.

To obtain a geometric view of the following theorem, we introduce some terminology. For an n×n matrix A, we define the ith Gershgorin disk Ci to be the disk in the complex plane with center Aii and radius ri=ρi(A)|Aii|; that is,

Ci={zC: |zAii|ri}.

For example, consider the matrix

A=(1+2i12i3).

For this matrix, C1 is the disk with center 1+2i and radius 1, and C2 is the disk with center 3 and radius 2. (See Figure 5.4.)

Figure 5.4

Gershgorin’s circle theorem, stated below, tells us that all the eigenvalues of A are located within these two disks. In particular, we see that 0 is not an eigenvalue, and hence by Exercise 9(c) of Section 5.1, A is invertible.

Theorem 5.15. (Gershgorin’s Circle Theorem).

Let AMn×n(C). Then every eigenvalue of A is contained in a Gershgorin disk.

Proof.

Let λ be an eigenvalue of A with the corresponding eigenvector

v=(v1v2vn).

Then v satisfies the matrix equation Av=λv, which can be written

j=1nAijvj=λvi(i=1, 2, , n).(3)

Suppose that vk is a coordinate of v having the largest absolute value; note that vk0 because v is an eigenvector of A.

We show that λ lies in Ck that is, |λAkk|rk. For i=k, it follows from (3) that

|λvkAkkvk|=|j=1nAkjvjAkkvk|=|jkAkjvj|jk|Akj||vj|jk|Akj||vk|=|vk|jk|Akj|=|vk|rk.

Thus

|vk||λAkk||vk|rk;

so

|λAkk|rk

because |vk|>0.

Corollary 1.

Let λ be any eigenvalue of AMn×n(C). Then |λ|ρ(A).

Proof.

By Gershgorin’s circle theorem, |λAkk|rk for some k. Hence

|λ|=|(λAkk)+Akk||λAkk|+|Akk|rk+|Akk|=ρk(A)ρ(A).

Corollary 2.

Let λ be any eigenvalue of AMn×n(C). Then

|λ|min{ρ(A), v(A)}.

Proof.

Since |λ|ρ(A) by Corollary 1, it suffices to show that |λ|v(A). By Exercise 15 of Section 5.1, λ is an eigenvalue of At, and so |λ|ρ(At) by Corollary 1. But the rows of At are the columns of A; consequently ρ(At)=v(A). Therefore |λ|v(A).

The next corollary is immediate from Corollary 2.

Corollary 3.

If λ is an eigenvalue of a transition matrix, then |λ|1. The next result asserts that the upper bound in Corollary 3 is attained.

Theorem 5.16.

Every transition matrix has 1 as an eigenvalue.

Proof.

Let A be an n×n transition matrix, and let uRn be the column vector in which each coordinate is 1. Then Atu=u by Theorem 5.14, and hence u is an eigenvector of At corresponding to the eigenvalue 1. But since A and At have the same eigenvalues, it follows that 1 is also an eigenvalue of A.

Suppose that A is a transition matrix for which some eigenvector corresponding to the eigenvalue 1 has only nonnegative coordinates. Then some multiple of this vector is a probability vector P as well as an eigenvector of A corresponding to eigenvalue 1. It is interesting to observe that if P is the initial probability vector of a Markov chain having A as its transition matrix, then the Markov chain is completely static. For in this situation, AmP=P for every positive integer m; hence the probability of being in each state never changes. Consider, for instance, the city-suburb problem with

P=(1656).

Theorem 5.17.

Let AMn×n(C) be a matrix in which each entry is a positive real number, and let λ be a complex eigenvalue of A such that |λ|=ρ(A). Then λ=ρ(A) and {u} is a basis for Eλ, where uCn is the column vector in which each coordinate equals 1.

Proof.

Let v be an eigenvector of A corresponding to λ, with coordinates v1, v2, , vn. Suppose that vk is the coordinate of v having the largest absolute value, and let b=|vk|. Then

|λ|b=|λ||vk|=|λvk|=|j=1nAkjvj|j=1n|Akjvj|=j=1n|Akj||vj|j=1n|Akj|b=ρk(A)bρ(A)b.(4)

Since |λ|=ρ(A), the three inequalities in (4) are actually equalities; that is,

  1. (a) |j=1nAkjvj|=j=1n|Akjvj|,

  2. (b) j=1n|Akj||vj|=j=1n|Akj|b, and

  3. (c) ρk(A)=ρ(A).

We will see in Exercise 15(b) of Section 6.1 that (a) holds if and only if all the terms Akjvj(j=1, 2, , n) are obtained by multiplying some nonzero complex number z by nonnegative real numbers. Without loss of generality, we assume that |z|=1. Thus there exist nonnegative real numbers c1, c2, , cn such that

Akjvj=cjz.(5)

By (b) and the assumption that Akj0 for all k and j, we have

|vj|=bfor j=1, 2, , n.(6)

Combining (5) and (6), we obtain

b=|vj|=|cjAkjz|=cjAkjfor j=1, 2, , n, 

and therefore by (5), we have vj=bz for all j. So

v=(v1v2vn)=(bzbzbz)=bzu, 

and hence {u} is a basis for Eλ.

Finally, observe that all of the entries of Au are positive because the same is true for the entries of both A and u. But Au=λu, and hence λ>0. Therefore, λ=|λ|=ρ(A).

Corollary 1.

Let AMn×n(C) be a matrix in which each entry is positive, and let λ be an eigenvalue of A such that |λ|=v(A). Then λ=v(A), and the dimension of Eλ equals 1.

Proof.

Exercise.

Corollary 2.

Let AMn×n(C) be a transition matrix in which each entry is positive, and let λ be any eigenvalue of A other than 1. Then |λ|<1. Moreover, the eigenspace corresponding to the eigenvalue 1 has dimension 1.

Proof.

Exercise.

Our next result extends Corollary 2 to regular transition matrices and thus shows that regular transition matrices satisfy condition (a) of Theorems 5.12 and 5.13.

Theorem 5.18.

Let A be a regular transition matrix, and let λ be an eigenvalue of A. Then

  1. (a) |λ|1

  2. (b) If |λ|=1, then λ=1, and dim(Eλ)=1.

Proof.

Statement (a) was proved as Corollary 3 to Theorem 5.15.

(b) Since A is regular, there exists a positive integer s such that As has only positive entries. Because A is a transition matrix and the entries of As are positive, the entries of As+1=As(A) are positive. Suppose that |λ|=1. Then λs and λs+1 are eigenvalues of As and As+1, respectively, having absolute value 1. So by Corollary 2 to Theorem 5.17, λs=λs+1=1. Thus λ=1. Let Eλ and Eλ denote the eigenspaces of A and As, respectively, corresponding to λ=1. Then EλEλ and, by Corollary 2 to Theorem 5.17, dim(Eλ)=1. Hence Eλ=Eλ, and dim(Eλ)=1.

Corollary.

Let A be a regular transition matrix that is diagonalizable. Then limmAm exists.

The preceding corollary, which follows immediately from Theorems 5.18 and 5.13, is not the best possible result. In fact, it can be shown that if A is a regular transition matrix, then the multiplicity of 1 as an eigenvalue of A is 1. Thus, by Theorem 5.7 (p. 264), condition (b) of Theorem 5.12 is satisfied. So if A is a regular transition matrix, limmAm exists regardless of whether A is or is not diagonalizable. As with Theorem 5.12, however, the fact that the multiplicity of 1 as an eigenvalue of A is 1 cannot be proved at this time. Nevertheless, we state this result here (leaving the proof until Exercise 21 of Section 7.2) and deduce further facts about limmAm when A is a regular transition matrix.

Theorem 5.19.

Let A be a regular transition matrix. Then

  1. (a) The multiplicity of 1 as an eigenvalue of A is 1.

  2. (b) limmAm exists.

  3. (c) L=limmAm is a transition matrix.

  4. (d) AL=LA=L.

  5. (e) The columns of L are identical. In fact, each column of L is equal to the unique probability vector v that is also an eigenvector of A corresponding to the eigenvalue 1.

  6. (f) For any probability vector w, limm(Amw)=v.

Proof.

  1. (a) see Exercise 21 of Section 7.2.

  2. (b) This follows from (a) and Theorems 5.18 and 5.12.

  3. (c) By Theorem 5.14, we must show that utL=ut. Now Am is a transition matrix by the corollary to Theorem 5.14, so

    utL=utlimmAm=limmutAm=limmut=ut, 

    and it follows that L is a transition matrix.

  4. (d) By Theorem 5.11,

    AL=AlimmAm=limmAAm=limmAm+1=L.

    Similarly, LA=L.

  5. (e) Since AL=L by (d), each column of L is an eigenvector of A corresponding to the eigenvalue 1. Moreover, by (c), each column of L is a probability vector. Thus, by (a), each column of L is equal to the unique probability vector v corresponding to the eigenvalue 1 of A.

  6. (f) Let w be any probability vector, and set y=limmAmw=Lw. Then y is a probability vector by the corollary to Theorem 5.14, and also Ay=ALw=Lw=y by (d). Hence y is also an eigenvector corresponding to the eigenvalue 1 of A. So y=v by (e).

Definition.

The vector v in Theorem 5.19(e) is called the fixed probability vector or stationary vector of the regular transition matrix A.

Theorem 5.19 can be used to deduce information about the eventual distribution in each state of a Markov chain having a regular transition matrix.

Example 4

Because of an expected El Niño in the coming fall, the National Weather Service predicted that, during the following winter, the probability of above- average, average, and below-average snowfall for a particular region will be 0.5, 0.3, and 0.2, respectively. Historical weather records for this region show that if there is above-average snowfall in a given winter, then the probabilities of above-average, average, and below-average snowfall for the next winter are 0.4, 0.2, and 0.2, respectively. In addition, if there is average snowfall in a given winter, then the probabilities of above-average, average, and below- average snowfall for the next winter are 0.1, 0.7, and 0.2, respectively. Finally, if there is below-average snowfall in a given winter, then the probabilities of above-average, average, and below-average snowfall for the next winter are 0.5, 0.1, and 0.6, respectively.

Assuming that this historical weather trend continues, the situation described in the preceding paragraph is a three-state Markov chain in which the states are above-average, average, and below-average winter snowfall. Here the probability vector that gives the initial probability of being in each state during the coming winter is

P=(0.500.300.20), 

and the transition matrix is

A=(0.400.200.200.100.700.200.500.100.60).

The probabilities of being in each state m winters after the original National Weather Service prediction are the coordinates of the vector AmP. The reader may check that

AP=(0.300.300.40), A2P=(0.260.320.42), A3P=(0.2520.3340.414), andA4P=(0.25040.34180.4078).

Note the apparent convergence of AmP.

Since A is regular, the long-range prediction for the region’s winter snowfall can be found by computing the fixed probability vector for A. This vector is the unique probability vector v such that (AI)v=0. Letting

v=(v1v2v3), 

we see that the matrix equation (AI)v=0 yields the following system of linear equations:

0.60v1 + 0.20v2 + 0.20v3 = 0 0.10v1 0.30v2 + 0.20v3 = 0 0.50v1 + 0.10v2 0.40v3 = 0.

It is easily shown that

(578)

is a basis for the solution space of this system. Hence the unique fixed probability vector for A is

(55+7+875+7+885+7+8)=(0.250.350.40).

Therefore, in the long run, the probabilities of above-average, average, and below-average winter snowfall for this region are 0.25, 0.35, and 0.40, respectively.

Note that if

Q=(503711814), 

then

Q1AQ=(10000.50000.2).

So

limmAm = Q[limm(10000.50000.2)m]Q1=Q(100000000)Q1 = (0.250.250.250.350.350.350.400.400.40).

Example 5

Farmers in Lamron plant one crop per year—either corn, soybeans, or wheat. Because they believe in the necessity of rotating their crops, these farmers do not plant the same crop in successive years. In fact, of the total acreage on which a particular crop is planted, exactly half is planted with each of the other two crops during the succeeding year. This year, 300 acres of corn, 200 acres of soybeans, and 100 acres of wheat were planted.

The situation just described is another three-state Markov chain in which the three states correspond to the planting of corn, soybeans, and wheat, respectively. In this problem, however, the amount of land devoted to each crop, rather than the percentage of the total acreage (600 acres), is given. By converting these amounts into fractions of the total acreage, we see that the transition matrix A and the initial probability vector P of the Markov chain are

A=(012121201212120)andP=(300600200600100600)=(121316).

The fraction of the total acreage devoted to each crop in m years is given by the coordinates of AmP, and the eventual proportions of the total acreage used for each crop are the coordinates of limmAmP. Thus the eventual amounts of land devoted to each crop are found by multiplying this limit by the total acreage; that is, the eventual amounts of land used for each crop are the coordinates of 600limmAmP.

Since A is a regular transition matrix, Theorem 5.19 shows that limmAm is a matrix L in which each column equals the unique fixed probability vector for A. It is easily seen that the fixed probability vector for A is

(131313).

Hence

L=(131313131313131313);

so

600limmAmP=600LP=(200200200).

Thus, in the long run, we expect 200 acres of each crop to be planted each year. (For a direct computation of 600limmAmP, see Exercise 14.)

In this section, we have concentrated primarily on the theory of regular transition matrices. There is another interesting class of transition matrices that can be represented in the form

(IBOC), 

where I is an identity matrix and O is a zero matrix. (Such transition matrices are not regular since the lower left block remains O in any power of the matrix.) The states corresponding to the identity submatrix are called absorbing states because such a state is never left once it is entered. A Markov chain is called an absorbing Markov chain if it is possible to go from an arbitrary state into an absorbing state in a finite number of stages. Observe that the Markov chain that describes the enrollment pattern in a community college is an absorbing Markov chain with states 1 and 4 as its absorbing states. (See page 291.) Readers interested in learning more about absorbing Markov chains are referred to Introduction to Finite Mathematics (third edition) by J. Kemeny, J. Snell, and G. Thompson (Prentice-Hall, Inc., Englewood Cliffs, N. J., 1974) or Discrete Mathematical Models by Fred S. Roberts (Prentice- Hall, Inc., Englewood Cliffs, N. J., 1976).

Applications*

Visit goo.gl/cKTyHi for an application of Markov chains to internet searches.

In species that reproduce sexually, the characteristics of an offspring with respect to a particular genetic trait are determined by a pair of genes, one inherited from each parent. The genes for a particular trait are of two types, which are denoted by G and g. The gene G represents the dominant characteristic, and g represents the recessive characteristic. Offspring with genotypes GG or Gg exhibit the dominant characteristic, whereas offspring with genotype gg exhibit the recessive characteristic. For example, in humans, brown eyes are a dominant characteristic and blue eyes are the corresponding recessive characteristic; thus the offspring with genotypes GG or Gg are brown-eyed, whereas those of type gg are blue-eyed.

Let us consider the probability of offspring of each genotype for a male fruit fly of genotype Gg. (We assume that the population under consideration is large, that mating is random with respect to genotype, and that the distribution of each genotype within the population is independent of sex and life expectancy.) Let

P=(pqr)

denote the present proportion of the fruit fly population with genotypes GG, Gg, and gg, respectively. This situation can by described by a three-state Markov chain with the following transition matrix:

GenotypeGGofGgoffspringggGenotype of female parentGGGggg(1214012121201412)=B.

It is easily checked that B2 contains only positive entries; so B is regular. Thus, if only males of genotype Gg survive until reproductive maturity, the proportion of offspring in the population having a certain genotype will stabilize at the fixed probability vector for B, which is

(141214).

Now suppose that similar experiments are to be performed with males of genotypes GG and gg. As already mentioned, these experiments are three- state Markov chains with transition matrices

A=(11200121000)andC=(00011200121), 

respectively. In order to consider the case where all male genotypes are permitted to reproduce, we must form the transition matrix M=pA+qB+rC, which is the linear combination of A, B, and C weighted by the proportion of males of each genotype. Thus

M=(p+12q12p+14q012q+r12p+12q+12rp+12q014q+12r12q+r).

To simplify the notation, let a=p+12q and b=12q+r. (The numbers a and b represent the proportions of G and g genes, respectively, in the population.) Then

M=(a12a0b12a012bb), 

where a+b=p+q+r=1.

Let p, q, and r denote the proportions of the first-generation offspring having genotypes GG, Gg, and gg, respectively. Then

(pqr)=MP=(ap+12aqbp+12q+ar12bp+br)=(a22abb2).

In order to consider the effects of unrestricted matings among the first- generation offspring, a new transition matrix M˜ must be determined based upon the distribution of first-generation genotypes. As before, we find that

M˜=(p+12q12p+14q012q+r12p+12q+12rp+12q014q+12r12q+r)=(a12a0b12a012bb), 

where a=p+12q and b=12q+r. However

a=a2+12(2ab)=a(a+b)=aandb=12(2ab)+b2=b(a+b)=b.

Thus M˜=M; so the distribution of second-generation offspring among the three genotypes is

M˜(MP)=M2P=(a3+a2ba2b+ab+ab2ab2+b3)=(a2(a+b)ab(a+1+b)b2(a+b))=(a22abb2)=MP, 

the same as the first-generation offspring. In other words, MP is the fixed probability vector for M, and genetic equilibrium is achieved in the population after only one generation. (This result is called the Hardy-Weinberg law.) Notice that in the important special case that a=b (or equivalently, that p=r), the distribution at equilibrium is

MP=(a22abb2)=(141214).

Exercises

  1. Label the following statements as true or false.

    1. (a) If AMn×n(C) and limmAm=L, then, for any invertible matrix QMn×n(C), we have limmQAmQ1=QLQ1.

    2. (b) If 2 is an eigenvalue of AMn×n(C), then limmAm does not exist.

    3. (c) Any vector

      (x1x2xn)Rn

      such that x1+x2++xn=1 is a probability vector.

    4. (d) The sum of the entries of each row of a transition matrix equals 1.

    5. (e) The product of a transition matrix and a probability vector is a probability vector.

    6. (f) Let z be any complex number such that |z|<1. Then the matrix

      (1z1z1111z)

      does not have 3 as an eigenvalue.

    7. (g) Every transition matrix has 1 as an eigenvalue.

    8. (h) No transition matrix can have 1 as an eigenvalue.

    9. (i) If A is a transition matrix, then limmAm exists.

    10. (j) If A is a regular transition matrix, then limmAm exists and has rank 1.

  2. Determine whether limmAm exists for each of the following matrices A, and compute the limit if it exists.

    1. (a) (0.10.70.70.1)

    2. (b) (1.40.82.41.8)

    3. (c) (0.40.70.60.3)

    4. (d) (1.84.80.82.2)

    5. (e) (2143)

    6. (f) (2.00.53.00.5)

    7. (g) (1.801.45.612.82.802.4)

    8. (h) (3.40.20.83.91.81.316.52.04.5)

    9. (i) (122i4i12+5i1+2i3i14i12i4i1+5i)

    10. (j) (26+i3284i3287+2i35+i372i13+6i65+6i63520i6)

  3. Prove that if A1, A2,  is a sequence of n×p matrices with complex entries such that limmAm=L, then limm(Am)t=Lt.

  4. Prove that if AMn×n(C) is diagonalizable and L=limmAm exists, then either L=In or rank(L)<n.

  5. Find 2×2 matrices A and B having real entries such that limmAm, limmBm, and limm(AB)m all exist, but

    limm(AB)m(limmAm)(limmBm).
  6. In the week beginning June 1, 30% of the patients who arrived by helicopter at a hospital trauma unit were ambulatory and 70% were bedridden. One week after arrival, 60% of the ambulatory patients had been released, 20% remained ambulatory, and 20% had become bedridden. After the same amount of time, 10% of the bedridden patients had been released, 20% had become ambulatory, 50% remained bedridden, and 20% had died. Determine the percentages of helicopter arrivals during the week of June 1 who were in each of the four states one week after arrival. Assuming that the given percentages continue in the future, also determine the percentages of patients who eventually end up in each of the four states.

  7. A player begins a game of chance by placing a marker in box 2, marked Start. (See Figure 5.5.) A die is rolled, and the marker is moved one square to the left if a 1 or a 2 is rolled and one square to the right if a 3, 4, 5, or 6 is rolled. This process continues until the marker lands in square 1, in which case the player wins the game, or in square 4, in which case the player loses the game. What is the probability of winning this game? Hint: Instead of diagonalizing the appropriate transition matrix A, it is easier to represent e2 as a linear combination of eigenvectors of A and then apply An to the result.

    Figure 5.5

  8. Which of the following transition matrices are regular?

    1. (a) (0.20.30.50.30.20.50.50.50)

    2. (b) (0.5010.500010)

    3. (c) (0.5000.501010)

    4. (d) (0.5010.510000)

    5. (e) (130013101301)

    6. (f) (10000.70.200.30.8)

    7. (g) (0120012000141410141401)

    8. (h) (141400141400141410141401)

  9. Compute limmAm if it exists, for each matrix A in Exercise 8.

  10. Each of the matrices that follow is a regular transition matrix for a three-state Markov chain. In all cases, the initial probability vector is

    P=(0.30.30.4).

    For each transition matrix, compute the proportions of objects in each state after two stages and the eventual proportions of objects in each state by determining the fixed probability vector.

    1. (a) (0.60.10.10.10.90.20.300.7)

    2. (b) (0.80.10.20.10.80.20.10.10.6)

    3. (c) (0.90.10.10.10.60.100.30.8)

    4. (d) (0.40.20.20.10.70.20.50.10.6)

    5. (e) (0.50.30.20.20.50.30.30.20.5)

    6. (f) (0.600.40.20.80.20.20.20.4)

  11. In 1940, a county land-use survey showed that 10% of the county land was urban, 50% was unused, and 40% was agricultural. Five years later, a follow-up survey revealed that 70% of the urban land had remained urban, 10% had become unused, and 20% had become agricultural. Likewise, 20% of the unused land had become urban, 60% had remained unused, and 20% had become agricultural. Finally, the 1945 survey showed that 20% of the agricultural land had become unused while 80% remained agricultural. Assuming that the trends indicated by the 1945 survey continue, compute the percentages of urban, unused, and agricultural land in the county in 1950 and the corresponding eventual percentages.

  12. A diaper liner is placed in each diaper worn by a baby. If, after a diaper change, the liner is soiled, then it is replaced by a new liner. Otherwise, the liner is washed with the diapers and reused, except that each liner is replaced by a new liner after its second use, even if it has never been soiled. The probability that the baby will soil any diaper liner is one-third. If there are only new diaper liners at first, eventually what proportions of the diaper liners being used will be new, once-used, and twice-used?

  13. In 1975, the automobile industry determined that 40% of American car owners drove large cars, 20% drove intermediate-sized cars, and 40% drove small cars. A second survey in 1985 showed that 70% of the large- car owners in 1975 still owned large cars in 1985, but 30% had changed to an intermediate-sized car. Of those who owned intermediate-sized cars in 1975, 10% had switched to large cars, 70% continued to drive intermediate-sized cars, and 20% had changed to small cars in 1985. Finally, of the small-car owners in 1975, 10% owned intermediate-sized cars and 90% owned small cars in 1985. Assuming that these trends continue, determine the percentages of Americans who own cars of each size in 1995 and the corresponding eventual percentages.

  14. Show that if A and P are as in Example 5, then

    Am=(rmrm+1rm+1rm+1rmrm+1rm+1rm+1rm), 

    where

    rm=13[1+(1)m2m1].

    Deduce that

    600(AmP)=Am(300200100)=(200+(1)m2m(100)200200+(1)m+12m(100)).
  15. Prove that if a 1-dimensional subspace W of Rn contains a nonzero vector with all nonnegative entries, then W contains a unique probability vector.

  16. Prove Theorem 5.14 and its corollary.

  17. Prove the two corollaries of Theorem 5.17. Visit goo.gl/V5Hsou for a solution.

  18. Prove the corollary of Theorem 5.18.

  19. Suppose that M and M are n×n transition matrices.

    1. (a) Prove that if M is regular, N is any n×n transition matrix, and c is a real number such that 0<c1, then cM+(1c)N is a regular transition matrix.

    2. (b) Suppose that for all i, j, we have that Mij>0 whenever Mij>0. Prove that there exists a transition matrix N and a real number c with 0<c1 such that M=cM+(1c)N.

    3. (c) Deduce that if the nonzero entries of M and M occur in the same positions, then M is regular if and only if M is regular.

The following definition is used in Exercises 20-24.

Definition.

For AMn×n(C), define eA=limmBm, where

Bm=I+A+A22!++Amm!.

This limit exists by Exercise 22 of Section 7.2. Thus eA is the sum of the infinite series

I+A+A22!+A33!+, 

and Bm is the mth partial sum of this series. (Note the analogy with the power series

ea=1+a+a22!+a33!+, 

which is valid for all complex numbers a.)

  1. Compute eO and eI, where O and I denote the n×n zero and identity matrices, respectively.

  2. Let P1AP=D be a diagonal matrix. Prove that eA=PeDP1.

  3. Let AMn×n(C) be diagonalizable. Use the result of Exercise 21 to show that eA exists. (Exercise 22 of Section 7.2 shows that eA exists for every AMn×n(C).)

  4. Find A, BM2×2(R) such that eAeBeA+B.

  5. Prove that a differentiable function x: RRn is a solution to the system of differential equations defined in Exercise 16 of Section 5.2 if and only if x(t)=etAv for some vRn, where A is defined in that exercise.