CS 2223 Dec 04 2015
Expected reading: pp. 538-542, 548-556
Daily Exercise:
The shortest distance between two people is a smile.
Victor Borge
1 BreadthFirst Search
1.1 Details
Breadth First Search works by maintaining the state of the search in a queue. In the example from yesterday (and indeed in the book) we only marked a vertex as being already seen. This is sufficient for the code to work, but it doesn’t help explain why the computed path is guaranteed to be shortest.
Here is a brief code snippet showing Breadth First Search. It has nearly the same structure as the Depth First Search. Additionally it stores a distTo[] value for each vertex. Initially this is set to Positive Infinity because there may in fact be no path from s to each vertex.
void bfs(Graph G, int s) { marked = new boolean[G.V()]; edgeTo = new int[G.V()]; Queue<Integer> q = new Queue<Integer>(); for (int v = 0; v < G.V(); v++) distTo[v] = Integer.MAX_VALUE; distTo[s] = 0; marked[s] = true; q.enqueue(s); while (!q.isEmpty()) { int v = q.dequeue(); for (int w : G.adj(v)) { if (!marked[w]) { edgeTo[w] = v; distTo[w] = distTo[v] + 1; marked[w] = true; q.enqueue(w); } } } }
In addition to storing a marked vector for which vertices have already been seen, it records a distTo vector that records the current best shortest distance from s to that particular vertex.
Clearly it must be initialized to Positive Infinity to be prepared before the Breadth First Search begins.
Consider some properties of the algorithm:
The queue only contains vertices that have been marked.
Once a vertex is marked, it records the shortest distance to s.
What about the vertices in the queue? Do they all record the same shortest distance distTo value?
1.2 Proof of correctness
Breadth First Search will locate a shortest path (there may be several with the same shortest path) and it will do so in time proportional to the sum (V+E).
1.2.1 Timing Considerations
While there is a nested loop, consider first that the outer while loop will execute as long as the queue is not empty. Since only unmarked vertices are added to the queue, this will never execute more then V times; it may stop much earlier if the graph is not connected.
What about the inner for loop? It iterates based on the degree of each v, that is, based on the number of edges incident to v. Since this loop is executed once for every vertex ultimately reachable from s you can tell that the if !marked[w] statement will execute no more than 2E times; it may execute far fewer if the graph is not connected.
Thus there will be no more then V enqueue/dequeue operations and the if statement will execute no more than E times.
Do not be confused and think that the time is V * 2E! Rather this is additive, so the performance is in time directly proportional to V+2E which we simplify as we have shown earlier as V + E since the multiplicative constant 2 doesn’t matter in the long run.
1.2.2 Correctness
How can we claim that the Breadth First Search computes a shortest path between the source vertex s and any vertex t in the graph?
We can proceed inductively by setting N to the number of marked vertices.
In the base case, N=1 and the source vertes, s, is marked and has been enqueued. distTo[s] is set to 0, which is the correct value for the shortest distance from s to s.
Assume that (by induction) we know that a graph with N marked vertices has the correct distTo value for each of these vertices. So, consider the problem of N+1 marked vertices from this earlier solution. The only way to add a newly marked vertex w is to encounter the edge (u, w) during the inner for loop. Now at this point, the previously unmarked vertex, w, is marked, and we can rely on the fact that dist[u] is correct (based on our Inductive Assumption). Now, the shortest distance from s to w is the shortest distance from s to u plus 1 for the edge (u, w). Thus, once marked, dist[w] is the shortest path from s to w.
But I hear you might argue, what if you visit a previously marked vertex and you do so in fewer steps from s?
Well, to defuse that argument, consider the state of the vertices that are contained in the queue. They are all marked. But what of the distTo values associated with each vertex in the queue? Observe that each vertex enqueued has a distTo value strictly greater than the distTo value of the vertex that was dequeued. If you think about this for a moment, you can recognize that the distTo values associated with each vertex in the queue appear (from left to right) in a fixed monotonically equal or ascending order.
That is, the queue may contain multiple vertices with the same associated distTo value, but at no point will a vertex with a lower associated distTo value appear to the left of a vertex with a higher distTo value.
1.3 In Class exercise
You will now perform a side by side comparison of DFS and BFS on a sample graph. I showed this example yesterday for DFS, and now you are to conduct both a BFS and a DFS on it from vertex 0. You are to stop the search once you mark vertex 7.
Along the way, you must draw a representation of the stack (for DFS) and queue (for BFS). Use the following graph:
1.4 Demonstration
Run the Shell multiple times to launch three sample applications:
algs.days.day21.BFSSearchAnimation 2000
algs.days.day21.DFSSearchAnimation 2000
algs.days.day21.GuidedSearchAnimation 2000
What does the number represent? It becomes the seed for the random number generator which ensures that all three programs will execute over the exact same graphs to provide accurate side by side comparison.
The GuidedSearchAnimation attempts to improve the search process by selecting the marked vertex that is closest to the target point.
Note that the guidance here is flawed because distance between vertices is just part of the equation; you can only make progress by traversing edges. Nonetheless, this gives you a brief glimpse into more advanced search strategies that you will likely encounter in an AI class.
1.5 Version : 2015/12/05
(c) 2015, George Heineman