Cryptography has its roots in the military, where low-level clerks (who could not be trusted) carried secret messages between commanders. The problem there was exchanging secret messages between high-level officers, when those messages were carried across hostile territories by low-level (possibly untrustable) clerks, with the inherent possibility that messages might fall into enemy hands. Today:
In networks, several types of problems:
The only general solution to security issues is to encrypt a user's plaintext messages into ciphertext messages. Ciphertext is a ``jumbled'' form of the message that can only be used if converted back into its original cleartext. It is the ciphertext that is sent across the network, and the intended recipient decrypts the ciphertext back into the original plaintext.
One of the fundamental rules of computer security is that an intruder knows a lot, if not everything, about the security system in use. ``Security through obscurity'' does not work! If a system has a vulnerability, one must assume that the intruder is aware of that vulnerability.
If encryption is used, for instance, one should assume that the intruder has a general idea of the how the encryption algorithm works.
The art of breaking ciphers is called cryptanalysis. The art of devising ciphers is called cryptography. Cryptology refers collectively to the task of making and breaking ciphers.
One problem with encryption is that once an intruder has determined the encryption algorithm, it must be changed immediately. How? Changing an algorithm may be impractical.
A common solution is to break the encryption process into two parts: Use a general encryption algorithm that uses a (short) string of characters called a key to select one of many encryptions to use. Now the code for the encryption algorithm changes rarely, but the key can be changed as often as needed.
Functions are used to encrypt and decrypt messages:
and
In the above terminology,
means the message is encrypted
using a key of K, while
denotes the decryption function
using a key of K.
The properties of encryption functions depend on the specific function in use. For example, some functions are designed in such a way that the same key encrypts and decrypts, but that the encryption and decryption functions differ. For other functions, the same function may perform both encryption and decryption (e.g., encrypt and decrypt are inverses of each other).
From a cryptanalyst's point of view, the immediate goal is to guess the keys being used in so that messages can be decrypted (or bogus messages can be encrypted). The cryptanalysis problem has four variations:
How are encryption algorithms broken? Brute force. We can always try all possible keys, in an attempt to find the one that works. The amount of effort needed to break encryption algorithms using brute force depends on two factors: 1) the amount of time needed to encrypt (decrypt) a message using a given key, and 2) the total number of possible keys that must be tried in the worst case. If it takes a long time (days to years) to try all keys, it is unlikely that anyone can guess a key using brute force.
In practice, brute force is not always needed. Why? Brute force is needed only if all combinations of bits (in keys) are equally likely to be present. If keys are user-chosen passwords, then it may well be a word in a dictionary. In addition, most messages contain patterns. The more messages one has available, the easier it becomes to recognize patterns. If messages contain english text, for instance, certain letters and phrases appear more often than others. Moreover, if the same key is used to encrypt subsequent messages, patterns in the original message will translate into patterns in the ciphertext. Recognizing patterns reduces greatly the number of combinations that need to be tried in guessing original plaintext.
Note: The ciphertext only problem is the most difficult to solve. However, it is naive to assume that a cipher that can withstand a ciphertext only attack is secure. In many cases, a cryptanalyst can make educated guesses as to some the plaintext. For instance, remote login sessions usually start with a message such as ``Please Login:''. Thus, most present-day commercial and government agencies desire that a cryptosystem be able to withstand a chosen plaintext attack.
In substitution ciphers, each symbol (or letter) is replaced by another symbol(or letter). The following examples give some simple ciphers:
All of these substitution codes, including the last, are relatively easy to crack. Why? While monoalphabetic substitution using just 26 letters produces 26! different tables, the table can be guessed using statistical techniques.
Assuming English text, for instance, the following facts are useful:
Another possibility is to assume that the message contains certain key words or phrases. Example? If we assume the occurrence of the word ``password'', we look for an 8-character sequence in which the 3rd and 4th letters are the same, but no others are repeated.
To thwart attacks based on statistical properties, we can use a polyalphabetic cipher, one in which the encrypting phase uses several translation tables.
The idea is to map a given symbol (e.g., ``a'') into more than one symbol (at different times).
The Vigenére cipher, for example, uses a key and 26 Caeser alphabets. The key is a string that specifies which particular alphabet to use when translating a symbol. For example:
Using a key of ``bad'', we translate the plaintext ``hereslookingatyoukid'' as follows:
Although better than monoalphabetic ciphers, polyalphabetic ciphers can still be broken. The trick to breaking polyalphabetic is guessing the key length and then using the same approaches as before in each table.
How can we prevent statistical attacks? Never reuse the encryption key.
Unfortunately, all current encryption methods have problems. Consider the one-time pad:
The data encryption standard (DES), developed by the NBS, has become one popular encryption method:
However, critics argue that both of these assumptions are false given today's technology. In particular:
Both DES and one-time pad are considered conventional methods because they use the same key for both encryption and decryption. Conventional methods have the problems that keys must be distributed to both the sender and recipient.
How can we use the network to distribute keys in the first place?
We can extend the hierarchy to as many levels as we need, with the keys at the lower levels changing more quickly than those near the top.
We can't simply give out part of the key, because we want any two presidents to have enough information to determine the key.
One solution is to ``hide'' the key as the
term of a polynomial
of degree (K-1), where K is the number of principles needed to use the
key:
where
,
, and
are chosen at random.
Each manager is given one data point on the curve (thus it takes
4 managers to determine
term), each VP two (2 VPs
determine the function/key), and the president four.
Until now, we have assumed that hiding keys is of the utmost importance. However, this leads to the key distribution problem. Another approach, called Public Key Encryption, uses two keys: a public key and a private key. The public key is given to everyone, while the private key is kept secret. It requires:
Under rules 2 and 3, Encrypt can be made public and distributed to everyone else. Decrypt remains private.
The RSA algorithm is an encryption algorithm that satisfies the above requirements. It is based on the factoring of large prime numbers, which no one knows how to do efficiently. Thus, RSA encryption is considered quite strong. RSA encryption uses a public and private key. The public key is made available to everyone, but the private key is kept secret. RSA encryption takes place as follows:
, where
denotes encryption using K's public key, and
denotes
encryption using K's private key.
The RSA algorithm is an encryption algorithm that satisfies the above requirements. The algorithm is as follows:
.
(Note that the message does not increase in length.)
.
How do we choose the keys (e.g., N, E, and D)?
That is,
.
.
Note: The encryption-decryption functions are inverses of each other,
because
.
How can we break the code? All known methods of attack are at least as difficult as finding the prime factors of N. The factoring problem has been studied for over 300 years, and there are no known algorithms that factor 200 digit numbers in a reasonable amount of time. Thus, although not ``provably'' secure, the method appears quite secure.
Note: Internet standards and prototype implementations are in the works for RSA encryption of mail.
Security implies that only the recipient of a message may decrypt it.
Now, when A wants to send a message to B, A encrypts messages for B using B's public key and sends the cipher to B, which B decrypts with its private key.
Can anyone else decrypt messages intended for B? No, one must have B's private key to decrypt the message, which only B has.
Unfortunately, although only B can decrypt the message, B cannot be sure that A actually sent the message.
Authentication is concerned with verifying that a message supposedly from B, actually did originate from B.
The key to authentication is having client A provide information that only A knows. Examples? For example, a password. The drawback of a password is an intruder might see the password (if it is unencrypted) and would then be able to gain unauthorized access to the system. Even better would be the encryption of a number using a key that only A knows.
For authentication, it is convenient to assume that:
That is, that a user's decryption key can also encrypt messages. The RSA algorithm discussed above has this property.
How does this give us authentication? Assume A wants to authenticate itself to B:
Problem? We still do not yet have secrecy. Although B can verify that a message came from A, anyone else who obtains a copy of the message can decrypt it too! We can combine the two ideas to create messages that are both secret and authenticated. A sends the following message to B:
which B processes via:
Note:
Protection against replay attacks (where someone replays old message in the hopes of forcing a server to (improperly) re-perform a transaction) can be handled as follows:
In the real world, signatures are legally binding. If someone signs a contract, the signature is proof that a party did agree to the terms of the contract. How can we have the equivalent of signatures with computers?
The problem of sending a ``signed'' message from one party to another has two parts:
Digital signatures can be achieved by using the public key encryption. Suppose B is concerned that A will later disavow having sent a message:
In some cases, users want authentication and protection against tampering, but don't need to keep messages secret, and may not want to pay the high cost of encrypting all messages. For lower cost, a trap-door function (or one-way checksum) can be used. These functions have the property that it is easy to calculate the checksum from plaintext, but not vice versa. They are frequently used for computing passwords:
Is it reasonable to assume that a private key never becomes public? Not necessarily. Unfortunately, a company might find it useful to leak a private key, so that they can later claim that they really didn't send a particular message.
One solution to this problem is to use a central authority called an authentication server (AS) for signatures and keys. The authentication server generates session keys and digital signatures, and is operated by a trusted third party.
Every user has its own private key for communicating with the AS, which is hand carried to the authentication server. No keys are made public.
When A communicates with B, the following takes place:
The authentication server can also provide digital signatures by using a special key X, that only it knows. To send a signed message from A to B:
Later, if A denies having sent the message, B produces signature
, which the authentication server verifies.
As long as only the authentication server knows X, and is trusted
not to forge messages from A, signatures cannot be forged.
The problem of identifying users when they log in is called user authentication. Most authentication methods are based on:
The most widespread form of authentication requires the user to enter a password. In the first implementation of Unix, a password file contained the cleartext password for each user. This approach had the following problems:
Another possible approach is use the password as a key that encrypts a constant. The encrypted value is then stored in a file.
How do we verify passwords now? When a user logs in, she provides a password that is used as the key to encrypt the constant. The encrypted value is then compared with the encrypted version in the password file. If the two match, the log in succeeds.
Of course, the encryption function should by a one-way (trap-door) function so that it is hard to invert, allowing the password file to be read by any programs. Now the encrypted passwords no longer even need not be protected. Anyone can read them.
How well do brute-force attacks work? One approach to penetrating this scheme is to guess a key, encrypt it, and compare the result with the encrypted password, repeating the process until successful. It turns out that guessing keys works because many people use poorly chosen passwords.
In one study of 3,289 passwords:
On a PDP-11/70, the system uses 1.25 milliseconds to encrypt a potential password. The time required to try combinations of characters is as follows:
Obviously, it pays to chose passwords carefully! Do we need to try all combinations of letters to guess a password? The search can be further speeded up by first trying:
Morris and Thompson, authors of the Unix password scheme, compiled a list of likely passwords using the above heuristics, encrypted each of them using the known encryption algorithm, and compared them with a list of encrypted password available at their site. Result: Over 86% of all passwords turned up in their list!
Unix now uses the DES encryption algorithm to encrypt passwords. The current algorithm is:
In addition, the passwd program that users invoked to change a password was changed to urge the use of harder passwords.
Consider an intruder attempting to gain access to as many accounts on as many systems as possible. For each encrypted password he precomputes (from a list of good candidates of course), he can check the entries for all users. That is,
build a table that maps cleartext passwords into their encrypted form. Once built, the same table can be used to map encrypted passwords back into cleartext!
The technique of salted passwords renders such attacks useless. Unix modifies the previous algorithm as follows:
Benefits? An intruder can no longer build a table containing a
one-to-one mapping of encrypted passwords back into their cleartext.
Each password has
possible encryptions! Precomputing the
table would require a lot of storage!
In password protected systems, the main idea is that authorized users have a certain piece of information that they present to the system. A generalization of this idea is to have the computer keep a large amount of information that only an authorized user knows. Rather than ask for a password, the system can ask the user a series of questions:
Finally, the use of a sophisticated password protection scheme has its limits.
Should the computer authenticate itself to the user? The following example shows why it may be important to identify the computer to the user:
One clever method for obtaining passwords uses the following strategy: A user leaves running on the terminal a program that simulates the login behavior of the system: it prompts the user for a login name and password. When the trusting user enters the two items, the program writes the information in a file (accessible to the program's owner), prints the message ``Invalid Login'', and executes the standard system user authentication program. The user, assuming that he mistyped his password, retypes his password, logs in, and never realizes what happened.
More recently, a ``bogus'' ATM machine was placed in a shopping mall. Customers would insert their cards, enter their PIN, and the machine would spit out the card saying the network was down. Customers assumed the machine was broken and walked away. In fact, the ATM machine collected their PIN and card numbers in order to get into their accounts. Over $50,000 was withdrawn from accounts before the fraud was discovered.