Overview
Impact of moving from a single machine to a networked set of machines has been tremendous. Look at Fig 9-1 for comparison of centralized vs. distributed. Also compare to isolated computers (9-2). Disadvantages (9-3).
Distributed System-- ``Autonomous machines linked by a network with software designed to produce an integrated computing facility.''--CDK.
``A collection of independent computers that appears to its users as a single coherent system.''--Tanenbaum
``A collection of loosely coupled processors interconnected by a communication network.''--SGG
Different types of hardware in which to build systems:
Can develop an operating system for either type of environment.
Two types of bus-based multiprocessor organization:
Caching is vital to get reasonable performance. For example, caches on a shared memory multiprocessor.
Want to maintain cache coherency. write-through cache--any changes to the cache are written through to memory. Combine with other processors on the bus watching the bus (snooping or snoopy cache).
Also can have write-back cache--only write the changed contents back to memory if another request is made.
A multicomputer has a non-uniform memory access model as well, but do not have a common address space. A distributed shared memory system at least layers a common address space.
The hard part to make the system work. Approaches:
All accomplished with servers on the remote machine. Processes waiting to handle requests. Systems are heterogeneous and autonomous (make own decisions).
See Tanenbaum Fig 1-24 for comparison.
Fundamental use of networked computers (why network them otherwise?). Files, information, work (computer supported cooperative working (CSCW)). Need policies and mechanisms for sharing the resources. Have clients and servers of information.
Can also be done with the object model.
Can the system be extended?
Need to design it into the system and publish the interfaces. Unix was an early open system. Look at DCE, CORBA and Jini as standards for creating open systems.
One goes down, have high probability that other machines are available. But distributed systems often have dependencies on one or a few machines.
Leslie Lamport on a distributed system ``One on which I cannot get any work done because some machine I have never heard of has crashed.''
Fault tolerance (ability to recover from faults)--hardware and software solutions.
Can we detect errors--what's the difference between a communication error and a computation that takes a long time to complete?
Issue of availability (how much is the system usable). Did a workstation or a server machine crash?
Different measurements. Actual applications versus low-level benchmarks.
How big in terms of machines and distance? Affects resource management and location. A distributed file system vs. Internet as an example. Affects issues of caching and replication.
Should be able to extend any resource.
Need an identifier for an entity. Often two levels of names: those used by users and those used by the system.
Structured or flat?
Name service to resolve names to their underlying value.
Names have contexts
Paradigms:
Approaches: