CS 3133, A Term 1998
Foundations of Computer Science
A Note on the Pumping Lemma

This page is located at http://www.cs.wpi.edu/~alvarez/CS3133/pumping.html

Pumping Lemma for Regular Languages

Suppose that L is the language accepted by a deterministic finite automaton with k states. Then every string z in L of length k or greater may be decomposed in the form z = uvw in such a way that the following conditions are satisfied:

Example

Using the pumping lemma one can show that the language L := {z | z has half as many a's as b's} is not regular. To do so, argue by contradiction: assume tentatively that L is regular, and show that a contradiction results. This then implies that the initial hypothesis that L is regular must be false, since contradictions are not possible.

Here goes. Suppose that the given language L is regular. Then L is accepted by some deterministic finite automaton (DFA), and by the pumping lemma every sufficiently long string z of L may be written in the form stated in the lemma. To reach a contradiction, however, we must choose the string z well. The aim is to choose a string for which pumping will be guaranteed to produce strings that can't possibly be in L. The choice of the string z is related to the number k that appears in the statement of the pumping lemma. If k=7, for example, then the string z=abb is obviously a bad choice because this z has length less than k and so the pumping lemma doesn't say anything about it. But the string z=aaabbbbbb is also a bad choice even though it is longer than k. This is because the pumping lemma doesn't say very much about the specific strings u,v,w entering into the decomposition z = uvw. We know that u and v combined will have a total length at most k, i.e. at most 7 in this case. But this could happen for several different strings u and v, for example the pair u=a, v=a would work, but so would the pair u=aa, v=abb. If the pumping lemma were to produce the latter pair, then we would have w=bbbb, and pumping would lead to the string uvnw = aa(abb)nbbbb. If you count a's and b's in the pumped string, you'll see that there are 2 + n a's and 2n + 4 b's, so there are half as many a's and b's, and therefore the pumped string is indeed in the language L. This is not the outcome we were looking for! We wanted to arrive at a contradiction. But with our choice for the string z we can't guarantee that this noncontradictory outcome will not occur.

In order to reach a contradiction as desired we have to choose z better than we did above. Here's a good choice if k=7: z = aaaaaaaabbbbbbbbbbbbbbbb = a8 b16. Since in the decomposition z = uvw produced by the pumping lemma we have length(uv) <= 7, it's guaranteed that both u and v will be strings of a's - no b's will appear in either u or v. Furthermore, since by the pumping lemma v can't be the empty string, we know that v will be a string of 1 or more a's (u on the other hand might be the empty string). When we pump to get uvnw the 1 or more a's in v will grow in number n-fold while the number of b's will not change. This will upset the balance between a's and b's and will guarantee that we get a string that can't be in the language L.

The above sketches out the idea of the argument. Here is a cleaned-up version like the one you are expected to produce in your own homework / exam solutions. Let L be defined as above. Suppose that L is a regular language. Then there is a DFA, call it M, such that L(M)=L. Let k be the number of states in M. Consider the string z = ak b2k. This z is in L since it has half as many a's as b's. Since L is (supposedly) regular, by the pumping lemma we can write z = uvw with length(uv)<=k, v nonempty, and uvnw in L for all n>=0. But since length(uv)<=k we must have u=ai, v=aj for some i>=0, j>=1 (notice that we're guaranteeing that v can't be the empty string) with i + j <= k, and therefore w=ak-i-jb2k. Pumping produces uvnw = ai ajn ak-i-jb2k. The number of a's in the pumped string is i + jn + k - i - j = j(n-1) + k, and the number of b's is 2k. Thus, the number of a's in the pumped string is not half the number of b's unless j(n-1)=0. But we know that j>=1, so the only way that j(n-1) can be 0 is for n-1 to be 0, i.e. n=1. But the pumping lemma states that any n>=0 is allowed. Thus, by choosing any n greater than 1, for example n=2, we see that the corresponding pumped string is not in L. This contradicts the pumping lemma, which states (among other things) that all such pumped strings must be in L. Thus, we conclude that our initial assumption that L is a regular language, must be false.