Student Background, Course Syllabus
Overview
Impact of moving from a single machine to a networked set of machines has been tremendous.
Distributed System--``Autonomous machines linked by a network with software designed to produce an integrated computing facility.''--CDK.
``A distributed system is a collection of independent computers that appears to its users as a single coherent system.''--Tanenbaum
``A collection of loosely coupled processors interconnected by a communication network.''--SGG
Historical Background. Look at Fig 1.9 and 1.10.
Begin with notion of distributed system (what is the interconnection hardware?)
Computer system classification:
tightly-coupled vs. loosely-coupled. Look at Fig 1-6.
Caching is vital to get reasonable performance. For example, caches on a shared memory multiprocessor.
Want to maintain cache coherency. write-through cache--any changes to the cache are written through to memory. Combine with other processors on the bus watching the bus (snooping or snoopy cache).
Also can have write-back cache--only write the changed contents back to memory if another request is made.
The hard part to make the system work. Approaches:
All accomplished with servers on the remote machine. Processes waiting to handle requests. Systems are heterogeneous and autonomous (make own decisions).
See Tanenbaum Fig 1-24 for comparison.
Fundamental use of networked computers (why network them otherwise?). Files, information, work (computer supported cooperative working (CSCW)). Need policies and mechanisms for sharing the resources. Have clients and servers of information.
Can also be done with the object model.
Can the system be extended?
Need to design it into the system and publish the interfaces. Unix was an early open system. Look at DCE, CORBA and Jini as standards for creating open systems.
One goes down, have high probability that other machines are available. But distributed systems often have dependencies on one or a few machines.
Leslie Lamport on a distributed system ``One on which I cannot get any work done because some machine I have never heard of has crashed.''
Fault tolerance (ability to recover from faults)--hardware and software solutions.
Can we detect errors--what's the difference between a communication error and a computation that takes a long time to complete?
Issue of availability (how much is the system usable). Did a workstation or a server machine crash?
Different measurements. Actual applications versus low-level benchmarks.
How big in terms of machines and distance? Affects resource management and location. A distributed file system vs. Internet as an example. Affects issues of caching and replication.
Should be able to extend any resource.
Need an identifier for an entity. Often two levels of names: those used by users and those used by the system.
Structured or flat?
Name service to resolve names to their underlying value.
Names have contexts
Paradigms:
Approaches: