Foundations of Computer Science

A Note on the Pumping Lemma

This page is located at http://www.cs.wpi.edu/~alvarez/CS3133/pumping.html

Suppose that L is the language accepted by a deterministic finite automaton with k states. Then every string z in L of length k or greater may be decomposed in the form z = uvw in such a way that the following conditions are satisfied:

- length(uv) <= k,
- v is nonempty, and
- u v
^{n}w is in L for all n >= 0 ("v can be pumped").

Here goes. Suppose that the given language L is regular.
Then L is accepted by some deterministic finite automaton (DFA),
and by the pumping lemma every sufficiently long string z of L
may be written in the form stated in the lemma. To reach a
contradiction, however, we must choose the string z well.
The aim is to choose a string for which pumping will be
*guaranteed* to produce strings that can't possibly be in L.
The choice of the string z is related to the number k
that appears in the statement of the pumping lemma.
If k=7, for example, then the string z=abb is obviously
a bad choice because this z has length less than k and
so the pumping lemma doesn't say anything about it.
But the string z=aaabbbbbb is also a bad choice even
though it is longer than k. This is because the pumping
lemma doesn't say very much about the specific strings
u,v,w entering into the decomposition z = uvw.
We know that u and v combined will have a total length
at most k, i.e. at most 7 in this case. But this
could happen for several different strings u and v,
for example the pair u=a, v=a would work, but so would
the pair u=aa, v=abb. If the pumping lemma were to
produce the latter pair, then we would have w=bbbb, and
pumping would lead to the string
uv^{n}w = aa(abb)^{n}bbbb.
If you count a's and b's in the pumped string,
you'll see that there are 2 + n a's and 2n + 4 b's,
so there are half as many a's and b's, and therefore
the pumped string is indeed in the language L.
This is *not* the outcome we were looking for!
We wanted to arrive at a contradiction.
But with our choice for the string z we can't
guarantee that this noncontradictory outcome will not occur.

In order to reach a contradiction as desired we have to choose
z better than we did above.
Here's a good choice if k=7: z = aaaaaaaabbbbbbbbbbbbbbbb
= a^{8} b^{16}.
Since in the decomposition z = uvw produced by the
pumping lemma we have length(uv) <= 7, it's guaranteed
that both u and v will be strings of a's - no b's will
appear in either u or v. Furthermore, since by the
pumping lemma v can't be the empty string, we know
that v will be a string of 1 or more a's (u on the
other hand might be the empty string). When we pump
to get uv^{n}w the 1 or more a's in v will grow
in number n-fold while the number of b's will not change.
This will upset the balance between a's and b's and
will guarantee that we get a string that can't be in
the language L.

The above sketches out the idea of the argument.
Here is a cleaned-up version like the one you are
expected to produce in your own homework / exam solutions.
Let L be defined as above. Suppose that L is a regular
language. Then there is a DFA, call it M, such that
L(M)=L. Let k be the number of states in M.
Consider the string z = a^{k} b^{2k}.
This z is in L since it has half as many a's as b's.
Since L is (supposedly) regular, by the pumping lemma
we can write z = uvw with length(uv)<=k, v nonempty,
and uv^{n}w in L for all n>=0. But since
length(uv)<=k we must have u=a^{i},
v=a^{j} for some i>=0, j>=1 (notice that
we're guaranteeing that v can't be the empty string)
with i + j <= k, and therefore w=a^{k-i-j}b^{2k}.
Pumping produces uv^{n}w =
a^{i} a^{jn} a^{k-i-j}b^{2k}.
The number of a's in the pumped string is i + jn + k - i - j = j(n-1) + k,
and the number of b's is 2k. Thus, the number of a's in the
pumped string is *not* half the number of b's unless j(n-1)=0.
But we know that j>=1, so the only way that j(n-1) can be 0 is
for n-1 to be 0, i.e. n=1. But the pumping lemma states that
any n>=0 is allowed. Thus, by choosing any n greater than 1,
for example n=2, we see that the corresponding pumped string
is not in L. This contradicts the pumping lemma, which states
(among other things) that all such pumped strings must be in L.
Thus, we conclude that our initial assumption that L is
a regular language, must be false.