Figure 11.1 as motivation
Lamport's paper
Fort Collins, CO has atomic clock built on vibratiions of Cesium 133 atom.
Universal Coordinated Time (introduction of leap seconds). Can have a receiver on the net tuned to WWV.
Want to define a constant such that the maximum drift rate
of the clock (
) is bounded.
Want maximum drift between two clocks to be
, must be resampled
every
(each clock can drift amount in opposite
directions)
Cristian's algorithm. Send a message to the time server and get back a reply. Adjust clock to the time. Fig. 11-6. Considerations:
Berkeley algorithm. Time server periodically computes a network time and sends it out to everyone else. Does not synchronize to an external time.
Averaging algorithms. Broadcast time--decentralized. Could also pick a random node to send to, less overhead, but slower convergence.
Multiple External Time Sources. Intersect them, average, and throw out any outlyers. Fig. 11-8 for example of OSF's DCE approach. Universal Coordinated Time (UTC).
Network Time Protocol (NTP). Standardized time protocol. Use a hierarchy of servers with those at the top receiving UTC directly (Fig 10.3) Three modes:
Protocol uses symmetric mode to calculate offset
and delay
between
two clocks (Fig 10.4).
Have talked about Lamport's algorithm.
Look at centralized algorithm. Fig. 11-9.
Distributed algorithms. Ricart-Agrawala have an updated version of Lamport's algorithm. Merge Release and Reply messages.
Algorithm:
Create a ring (logical or physical) and pass a token between the nodes. If the node needs the critical section then it grabs the token.
Suzuki-Kasami's Broadcast algorithm--keep a vector of current state and broadcast requests. Each machine broadcasts a request for the token when it is needed.
Singhal has a heuristic improvement to only send to the sites it thinks may have the token.
Look at Fig 11-12.
A process holds an election to elect a leader. The election is held as follows:
Similar to bully algorithm in trying to elect the highest numbered node.
Any node can start and marks itself as a participant (vs. non-participant) in an election. It puts its identifier in the election message. Successive nodes mark themselves as participants and if they have a greater value then they substitute their id in the elected message, otherwise they just pass the value on.
When the receiver gets its id back then it is the coordinator and sends an elected message. The other nodes use this message to mark themselves as non-participants.
Notion of participation is used to quelch other elections. At worst the
protocol could take
messages (
to find leader,
before
leader sees its id again and
to send around elected message).
Show a picture with 3-9-15-6-12-18 (start with 3).
One processor initiates the process to try and get all processors to agree on a value (in the possible presense of faulty processors).
Outcome: all non-faulty processors agree on the same value. If the source processor is non-faulty then the common agreed upon value by all non-faulty processors should be the initial value of the source.
Also referred to as the Byzantine Generals Problem where distributed generals are trying to reach agreement on an attack plan. They communicate only through messengers. Some generals are traitors and try and mislead the others (faulty processors).
Lamport showed that with n processors, it is impossible to reach agreement
with more than
processors being faulty. For
example, with three processors there cannot be any faulty processors.
With
faulty processors, must have
total processors to reach
agreement.
Oral Message algorithm for
processors with
faulty processors
OM(m).
For
, the source processor simply sends its value to every processor,
with a default value of 0 if none received.
Algorithm OM(m)
:
Complexity is
Must know maximum number of faulty processors ahead of time.
Assume that all processors respond in a finite amount of time. If messages get lost then the problem cannot be solved--like blue army/white army problem.
skip for now
skip for now
How to know when a distributed computation (election, deadlock detection, distributed query) terminates??
Huang, 1989.
Use a weighting algorithm with a controlling agent. Initially the controlling agent has a weight of one and all other nodes have a weight of zero. Show picture.
Invariant: sum of weights in the system is always one.