From Andrew File System work:
- most files are small--transfer files rather than disk blocks?
- reading more common than writing
- most access is sequential
- most files have a short lifetime--lots of applications generate
temporary files (such as a compiler).
- file sharing is unusual--argues for client caching
- processes use few files
- files can be divided into classes--handle ``system'' files and
``user'' files differently.
Primarily look at three distributed file systems as we look at issues.
- 1.
- File Transfer Protocol (FTP). Motivation is to provide file sharing
(not a distributed file system). 1970s.
Connect to a remote machine and
interactively send or fetch an arbitrary file. FTP deals with
authentication, listing a directory contents, ascii or binary files, etc.
Typically, a user connecting to an FTP server must specify an account
and password. Often, it is convenient to set up a special account in
which no password is needed. Such systems provide a service
called anonymous FTP where userid is ``anonymous'' and password
is typically user email address.
- 2.
- Sun's Network File System (NFS). Motivated by wanting to extend a
Unix file system to a distributed environment. Easy file sharing and
compatability with existing systems. Mid-1980's.
Stateless in that servers do not maintain state about clients. RPC calls
supported:
- searching for a file within a directory
- reading a set of directory entries
- manipulating links and directories
- accessing file attributes
- reading/writing file data
- 3.
- Andrew File System (AFS). Research project at CMU in 1980s.
Company called Transarc. Primary motivation was to build a scalable
distributed file system. Look at pictures.
Other file systems:
- 1.
- CODA: AFS spin-off at CMU. Ddisconnection and fault recovery.
- 2.
- Sprite: research project at UCB in 1980's. To build a distributed
Unix system.
- 3.
- Echo. Digital SRC.
- 4.
- Amoeba Bullet File Server: Tanenbaum research project.
- 5.
- NTFS
How are files named? Access independent? Is the name location independent?
- FTP. location and access dependent.
- NFS. location dependent through client mount points. Largely
transparent for ordinary users, but the same remote file system could be
mounted differently on different machines. Access independent. See Fig
9-3. Has automount feature for file systems to be mounted on demand. All
clients could be configured to have same naming structure.
- AFS. location independent. Each client has the same look within a
cell. Have a cell at each site. See Fig 13-15.
NTFS uses uniform naming convention
\\server_name\share_name\x\y\z. For example in CS department, mount
Unix file system under NT \\Rav2\cew\tmp\file.doc. Access, but not
location transparent.
Can files be migrated between file server machines? What must clients be
aware of?
- FTP. Sure, but end-user must be aware.
- NFS. Must change mount points on the client machines.
- AFS. On a per-volume (collection of files managed as a single unit)
basis.
Are directories and files handled with the same or a different mechanism?
- FTP. Directory listing handled as remote command.
- NFS. Unix-like.
- AFS. Unix-like.
Amoeba has separate mechanism for directories and files.
What type of file sharing semantics are supported if two processes
accessing the same file?
Possibilities shown in Fig. 13-5.
- FTP. User-level copies. No support.
- NFS. Mostly Unix semantics.
- AFS. Session semantics.
What, if any, file caching is supported?
Different options are shown in Fig 13-11.
Immutable files in Amoeba.
Does the system support locking of files?
- FTP. N/A.
- NFS. Has mechanism, but external to NFS.
- AFS. Does support.
Is file replication/reliability supported and how?
- FTP. No.
- NFS. Unknown.
- AFS. For read-only volumes within a cell. For example binaries and
system libraries.
Look at Fig 13-13 for more theoretical approach. Have version numbers and
voting.
Nr + Nw > N. Usually keep Nr small and Nw close to N. Must
assemble a read-quorum or a write-quorum to read/write a
file.
Can have ghosts to cast write votes and then when the down machine
comes up it immediately gets the new version.
Is the system scalable?
- FTP. Yes. Millions of users.
- NFS. Not so much. 10-100s
- AFS. Better than NFS, keep traffic away from file servers. 1000s.
Is hardware/software homogeneity required?
- FTP. No.
- NFS. No.
- AFS. No.
Is the application interface compatible to Unix or is another
interface used?
- FTP. Separate.
- NFS. The same.
- AFS. The same.
What security and protection features are available to control access?
- FTP. Account/password authorization.
- NFS. RPC Unix authentication. If root on a client machine then can
create an arbitrary login/userid and gain access to files on file server.
Also could gain access as root unless account mapping used to map root to ``nobody''
- AFS. Unix permissions for files, access control lists for directories.
Do file system servers maintain state about clients? Look at Fig 13-8.
- FTP. No.
- NFS. No.
- AFS. Yes.
Think about for file systems and other large distributed systems.
- Workstations have cycles to burn. Make clients do work whenever
possible.
- Cache whenever possible.
- Exploit file usage properties. Understand them.
One-third of Unix files are temporary.
- Minimize system-wide knowledge and change. Do not hardwire
locations.
- Trust the fewest possible entitities. Do not trust workstations.
- Batch if possible to group operations.