Cryptography

Cryptography has its roots in the military, where low-level clerks (who could not be trusted) carried secret messages between commanders. The problem there was exchanging secret messages between high-level officers, when those messages were carried across hostile territories by low-level (possibly untrustable) clerks, with the inherent possibility that messages might fall into enemy hands. Today:

In networks, several types of problems:

Secrecy:
Keeping information out of unauthorized users hands.
Authentication:
Problem of impersonation where an intruder might masquerade as a legitimate user. Such as obtaining message copies through wiretapping and then using a replay attack. An impersonator could ask for information that they are not authorized to see, or generate messages causing incorrect (and certainly unauthorized) transactions to take place (e.g., transferring money from one account to another). Note that an impersonator could intercept and modify messages, or create new ones.
Nonrepudiation:
Digitial signatures so a user cannot disclaim an agreement.
Integrity Control
: Message was the one really sent.

The only general solution to security issues is to encrypt a user's plaintext messages into ciphertext messages. Ciphertext is a ``jumbled'' form of the message that can only be used if converted back into its original cleartext. It is the ciphertext that is sent across the network, and the intended recipient decrypts the ciphertext back into the original plaintext.

One of the fundamental rules of computer security is that an intruder knows a lot, if not everything, about the security system in use. ``Security through obscurity'' does not work! If a system has a vulnerability, one must assume that the intruder is aware of that vulnerability.

If encryption is used, for instance, one should assume that the intruder has a general idea of the how the encryption algorithm works.

The art of breaking ciphers is called cryptanalysis. The art of devising ciphers is called cryptography. Cryptology refers collectively to the task of making and breaking ciphers.

One problem with encryption is that once an intruder has determined the encryption algorithm, it must be changed immediately. How? Changing an algorithm may be impractical.

A common solution is to break the encryption process into two parts: Use a general encryption algorithm that uses a (short) string of characters called a key to select one of many encryptions to use. Now the code for the encryption algorithm changes rarely, but the key can be changed as often as needed.

Functions are used to encrypt and decrypt messages:

tex2html_wrap_inline377 and

tex2html_wrap_inline379 In the above terminology, tex2html_wrap_inline381 means the message is encrypted using a key of K, while tex2html_wrap_inline385 denotes the decryption function using a key of K.

The properties of encryption functions depend on the specific function in use. For example, some functions are designed in such a way that the same key encrypts and decrypts, but that the encryption and decryption functions differ. For other functions, the same function may perform both encryption and decryption (e.g., encrypt and decrypt are inverses of each other).

From a cryptanalyst's point of view, the immediate goal is to guess the keys being used in so that messages can be decrypted (or bogus messages can be encrypted). The cryptanalysis problem has four variations:

Ciphertext Only:
Ciphertext but no plaintext is present. Given only the ciphertext, determine the key and the original text. The ciphertext only problem is particularly difficult to solve because we may not even know when we have guessed the correct key because we don't know what the unencrypted data looks like.
Verifiable Plaintext:
We don't have any actual plaintext, but given ciphertext, we can verify that we have guessed the correct key. This is weaker than the above, in that even though we don't have plaintext, we can (somehow) determine whether or not we have guessed the correct key. Example? A pair of (encrypted) request-response messages may carry the same sequence number, so that the client can match response messages with the original request. When decrypting the messages, we know we have guessed the correct key if we obtain the same sequence number when decrypting both messages. Thus, although we don't have any plaintext, we can verify that we have obtained the plaintext from the ciphertext. Another example is decrypting messages that we know contain ascii text. We have know idea what the text is, but we can certainly recognize whether its ascii or not.
Known Plaintext:
Both plaintext and its encrypted ciphertext are available. In this case, verifying that we have guessed the key is straightforward, since we can test our key on the plaintext.
Chosen Plaintext:
Here, we have the ability to encrypt pieces of plaintext of our own choosing.

How are encryption algorithms broken? Brute force. We can always try all possible keys, in an attempt to find the one that works. The amount of effort needed to break encryption algorithms using brute force depends on two factors: 1) the amount of time needed to encrypt (decrypt) a message using a given key, and 2) the total number of possible keys that must be tried in the worst case. If it takes a long time (days to years) to try all keys, it is unlikely that anyone can guess a key using brute force.

In practice, brute force is not always needed. Why? Brute force is needed only if all combinations of bits (in keys) are equally likely to be present. If keys are user-chosen passwords, then it may well be a word in a dictionary. In addition, most messages contain patterns. The more messages one has available, the easier it becomes to recognize patterns. If messages contain english text, for instance, certain letters and phrases appear more often than others. Moreover, if the same key is used to encrypt subsequent messages, patterns in the original message will translate into patterns in the ciphertext. Recognizing patterns reduces greatly the number of combinations that need to be tried in guessing original plaintext.

Note: The ciphertext only problem is the most difficult to solve. However, it is naive to assume that a cipher that can withstand a ciphertext only attack is secure. In many cases, a cryptanalyst can make educated guesses as to some the plaintext. For instance, remote login sessions usually start with a message such as ``Please Login:''. Thus, most present-day commercial and government agencies desire that a cryptosystem be able to withstand a chosen plaintext attack.

Substitutions Ciphers

In substitution ciphers, each symbol (or letter) is replaced by another symbol(or letter). The following examples give some simple ciphers:

Caesar cipher:
Each letter is shifted to the right by 3. ``a'' becomes ``d'', ``b'' becomes ``e'', ``c'' becomes ``f'', etc. The Unix utility tr(1) implements this trivially.
K shift cipher:
Instead of shifting by 3, shift each letter K positions. Here, K is a key for the generic shift algorithm.
Monoalphabetic substitution:
Table of codes; the symbol's (plaintext) indexes into the table to produce its cipher counterpart.

All of these substitution codes, including the last, are relatively easy to crack. Why? While monoalphabetic substitution using just 26 letters produces 26! different tables, the table can be guessed using statistical techniques.

Assuming English text, for instance, the following facts are useful:

Another possibility is to assume that the message contains certain key words or phrases. Example? If we assume the occurrence of the word ``password'', we look for an 8-character sequence in which the 3rd and 4th letters are the same, but no others are repeated.

To thwart attacks based on statistical properties, we can use a polyalphabetic cipher, one in which the encrypting phase uses several translation tables.

The idea is to map a given symbol (e.g., ``a'') into more than one symbol (at different times).

The Vigenére cipher, for example, uses a key and 26 Caeser alphabets. The key is a string that specifies which particular alphabet to use when translating a symbol. For example:

Using a key of ``bad'', we translate the plaintext ``hereslookingatyoukid'' as follows:

Although better than monoalphabetic ciphers, polyalphabetic ciphers can still be broken. The trick to breaking polyalphabetic is guessing the key length and then using the same approaches as before in each table.

One-Time Pad

How can we prevent statistical attacks? Never reuse the encryption key.

Unfortunately, all current encryption methods have problems. Consider the one-time pad:

  1. It uses a random key that has the same length as the message being encrypted. Both the encryptor and decryptor use the same key.

  2. The message is exclusive-or'd with the key to produce the ciphertext.

  3. The receiver decrypts the ciphertext using the same algorithm and key. ((A XOR K) XOR K) = A.

  4. Have we solved our original problem? No! Because the sender and receive use the same key, the key must be transferred to the receiver. Exchanging keys is the very same problem we are trying to solve!

  5. The key can never be reused (or it becomes easier to guess). That is, reusing keys allows statistical based attacks on the key.

  6. Choosing keys presents difficulties; in particular, ``randomly generated'' keys are not very random at all. That is, computer generated random numbers are actually well defined sequences. Although the generated numbers are uniformly distributed, they are not random.

  7. It is the only provably secure method known at this time.

Data Encryption Standard (DES)

The data encryption standard (DES), developed by the NBS, has become one popular encryption method:

  1. The calculation can be performed efficiently using special hardware, but far less efficiently by normal programs.

  2. Keys are 56 bits in size, large enough that they are difficult to guess.

However, critics argue that both of these assumptions are false given today's technology. In particular:

  1. Are 56-bit keys too small? Many people say yes. Indeed, IBM's original design used 128-bit keys.

  2. Why aren't keys bigger? The key size was reduced at the request of the National Security Agency (NSA), who has never given a reason for the change.

  3. IBM has never given a technical reason for the specific design of its algorithm. Why? Again, this was done at NSA's request.

  4. Without knowing the design principles, it is difficult to know whether the code can be broken easily. Indeed, some critics suggest that the NSA would feel very uncomfortable with an encryption standard that not even they can break.

Both DES and one-time pad are considered conventional methods because they use the same key for both encryption and decryption. Conventional methods have the problems that keys must be distributed to both the sender and recipient.

The Key Distribution Problem

How can we use the network to distribute keys in the first place?

Hierarchies:
We could have a set of master keys used only to distribute session keys. Session keys are used only for a short period of time (e.g., for the duration of a session, such as a connection). By using master keys less frequently than session keys, we need change them less often, making it feasible to exchange them by a more cumbersome method, such as by hand.

We can extend the hierarchy to as many levels as we need, with the keys at the lower levels changing more quickly than those near the top.

Puzzles:
Have the client-server use the following protocol:
Server:
I am going to send 20,000 puzzles, each in its own message. Each puzzle starts with 128 zero bits, followed by a 16-bit puzzle number, and a (random) 56-bit key.
All puzzles (messages) have been encrypted using DES with a key whose final 22 bits are zero. Thus, there are only tex2html_wrap_inline395 possible keys. Different keys are used for each encryption.
Client:
Pick one puzzle at random (they arrive in a random order), and use brute force to break it. How? We simply try all tex2html_wrap_inline397 keys, stopping when the decrypted message starts with 128 zeros.
Client:
Send (plaintext) message to server containing the puzzle number we cracked. Now both the sender and receiver know what puzzle was broken, and subsequent communication uses the 56-bit key in the original message. Note that because the original puzzles were sent in random order, having the client send the puzzle number containing the chosen key in cleartext doesn't help the intruder. The intruder doesn't know which message contained that puzzle number unless it decrypts all puzzles. Since the puzzles were all encrypted using different keys, all must be decrypted, a process that could takes years. However, it takes only a few hours to break any single puzzle.

Key protection:
Sometimes one also wants to hide the key from any one user. In a company, for instance, suppose that we want only the following combinations of persons to open the vault:

  1. The president.

  2. Any two (of twenty) senior vice presidents.

  3. Any four (of 200) managers.

We can't simply give out part of the key, because we want any two presidents to have enough information to determine the key.

One solution is to ``hide'' the key as the tex2html_wrap_inline399 term of a polynomial of degree (K-1), where K is the number of principles needed to use the key:

tex2html_wrap_inline401

where tex2html_wrap_inline403 , tex2html_wrap_inline405 , and tex2html_wrap_inline407 are chosen at random.

Each manager is given one data point on the curve (thus it takes 4 managers to determine tex2html_wrap_inline399 term), each VP two (2 VPs determine the function/key), and the president four.

Public Key Cryptography

Until now, we have assumed that hiding keys is of the utmost importance. However, this leads to the key distribution problem. Another approach, called Public Key Encryption, uses two keys: a public key and a private key. The public key is given to everyone, while the private key is kept secret. It requires:

  1. Plaintext = Decrypt(Encrypt(Plaintext)). (E.g., the encrypt and decrypt operations are inverses of each other.)

  2. It must be extremely difficult to deduce Decrypt given Encrypt. That is, the encryption function is made available to everyone, so it shouldn't be possible for someone to deduce the decryption function given the encryption function.

  3. Decrypt cannot be broken by a chosen plaintext attack. Since everyone has the encryption function, anyone can present their own chosen plaintext to the encryption algorithm.

Under rules 2 and 3, Encrypt can be made public and distributed to everyone else. Decrypt remains private.

The RSA algorithm is an encryption algorithm that satisfies the above requirements. It is based on the factoring of large prime numbers, which no one knows how to do efficiently. Thus, RSA encryption is considered quite strong. RSA encryption uses a public and private key. The public key is made available to everyone, but the private key is kept secret. RSA encryption takes place as follows:

tex2html_wrap_inline423 , where tex2html_wrap_inline425 denotes encryption using K's public key, and tex2html_wrap_inline429 denotes encryption using K's private key.

RSA Details

The RSA algorithm is an encryption algorithm that satisfies the above requirements. The algorithm is as follows:

  1. Break the plaintext message into sets of bit string blocks with each block's integer value between 0-(n-1).

  2. The (public) encryption key consists of a pair (E, N) of positive integers, and encrypt the message via:

    tex2html_wrap_inline437 .

    (Note that the message does not increase in length.)

  3. The (private) decrypt key consists of a pair of numbers (D, N), and decrypt the message via:

    tex2html_wrap_inline441 .

How do we choose the keys (e.g., N, E, and D)?

  1. Choose two large primes, P and Q, each tex2html_wrap_inline447 , where tex2html_wrap_inline449 .

  2. Pick a large, random integer D that is relatively prime to tex2html_wrap_inline451 .

    That is, tex2html_wrap_inline453 .

  3. Finally, compute E to be the multiplicative inverse of D, modulo tex2html_wrap_inline455 . That is:

    tex2html_wrap_inline457 .

Note: The encryption-decryption functions are inverses of each other, because tex2html_wrap_inline459 .

How can we break the code? All known methods of attack are at least as difficult as finding the prime factors of N. The factoring problem has been studied for over 300 years, and there are no known algorithms that factor 200 digit numbers in a reasonable amount of time. Thus, although not ``provably'' secure, the method appears quite secure.

Note: Internet standards and prototype implementations are in the works for RSA encryption of mail.

Security

Security implies that only the recipient of a message may decrypt it.

Now, when A wants to send a message to B, A encrypts messages for B using B's public key and sends the cipher to B, which B decrypts with its private key.

Can anyone else decrypt messages intended for B? No, one must have B's private key to decrypt the message, which only B has.

Unfortunately, although only B can decrypt the message, B cannot be sure that A actually sent the message.

Authentication

Authentication is concerned with verifying that a message supposedly from B, actually did originate from B.

The key to authentication is having client A provide information that only A knows. Examples? For example, a password. The drawback of a password is an intruder might see the password (if it is unencrypted) and would then be able to gain unauthorized access to the system. Even better would be the encryption of a number using a key that only A knows.

For authentication, it is convenient to assume that:

That is, that a user's decryption key can also encrypt messages. The RSA algorithm discussed above has this property.

How does this give us authentication? Assume A wants to authenticate itself to B:

  1. B selects a random number X and sends tex2html_wrap_inline477 to A.

  2. A computes tex2html_wrap_inline479 . That is, it decrypts the received message using its private key.

  3. A returns tex2html_wrap_inline481 , which B decrypts. B knows it is communicating with A if it gets the expected answer of X+1. That is, to compute X+1, A must have been able to decrypt X, which means it must have A's private key.

Problem? We still do not yet have secrecy. Although B can verify that a message came from A, anyone else who obtains a copy of the message can decrypt it too! We can combine the two ideas to create messages that are both secret and authenticated. A sends the following message to B:

which B processes via:

Note:

  1. B is sure that A sent the message because only A knows tex2html_wrap_inline489 . This is how we get authentication -- forcing A to produce some information that only A can have.

  2. A is sure that only B can read the message because only B knows its private key (secure)

  3. Unfortunately, each message transaction involves four iterations of the functions, adding substantial cost.

Replay Attacks

Protection against replay attacks (where someone replays old message in the hopes of forcing a server to (improperly) re-perform a transaction) can be handled as follows:

  1. Client sends request to server.

  2. Server returns to client a random number R (in an encrypted message).

  3. Client returns (R+1) to server.

  4. Have each transaction uses a different random number. This effectively prevents replays because the returned (R+1) will only match the old transaction, not the one being performed now. That is, if the same transaction were done twice (legitimately), different transaction numbers would be used. Thus, replay attacks are not possible.
The key here is that the server forces the client to prove that it knows its private key before doing a transaction. Thus, and intruder cannot record a transaction and play it back later -- the server would generate a new random number that the client could not decrypt.

Digital Signatures

In the real world, signatures are legally binding. If someone signs a contract, the signature is proof that a party did agree to the terms of the contract. How can we have the equivalent of signatures with computers?

The problem of sending a ``signed'' message from one party to another has two parts:

Authentication:
Having the receiver verify the claimed identity of the sender (e.g., ``Are you really cew@cs.wpi.edu?''). We have already discussed how authentication can be done.
Digital signature:
Preventing the sender from later repudiating the message (e.g., ``Hey, I never said that!'').

Digital signatures can be achieved by using the public key encryption. Suppose B is concerned that A will later disavow having sent a message:

  1. A sends to B: tex2html_wrap_inline497 .

  2. B decrypts the message using its private key, but saves the signature tex2html_wrap_inline499 in archival storage.

  3. B then decrypts the message using A's public key and processes the transaction.
Later, when A denies ever having sent the message, B produces signature. Why can't A disavow the signature? Only A's private key could have produced it.

In some cases, users want authentication and protection against tampering, but don't need to keep messages secret, and may not want to pay the high cost of encrypting all messages. For lower cost, a trap-door function (or one-way checksum) can be used. These functions have the property that it is easy to calculate the checksum from plaintext, but not vice versa. They are frequently used for computing passwords:

  1. Before sending a message, A computes cksm=Checksum(Message), applies its private key to the checksum, and sends Message (in cleartext), the encrypted checksum tex2html_wrap_inline517 , and its identity A.

  2. When B gets the message, it uses A's public key to decrypt cksm, computes the checksum over the sent message, and compares the two results. If they differ, B knows that the message has been tampered with. Note that checksums can be computed much faster than doing full blown encryptions.

  3. As a bonus, later, when A denies having sent the message, B has tex2html_wrap_inline517 as proof to the contrary. (That is, only A has the private key needed to perform the encryption.)

Is it reasonable to assume that a private key never becomes public? Not necessarily. Unfortunately, a company might find it useful to leak a private key, so that they can later claim that they really didn't send a particular message.

One solution to this problem is to use a central authority called an authentication server (AS) for signatures and keys. The authentication server generates session keys and digital signatures, and is operated by a trusted third party.

Every user has its own private key for communicating with the AS, which is hand carried to the authentication server. No keys are made public.

When A communicates with B, the following takes place:

  1. A requests a session key from the AS. The server chooses a key, and returns two copies (one encrypted with A's key, the other with B's key): tex2html_wrap_inline545 and tex2html_wrap_inline547 .

  2. A uses tex2html_wrap_inline551 to decrypt its copy and sends the second copy to B, instructing it to decrypt it using tex2html_wrap_inline555 .

  3. Communication proceeds with both sides using the same session key, and neither party ever need know the other's key.

The authentication server can also provide digital signatures by using a special key X, that only it knows. To send a signed message from A to B:

  1. A sends tex2html_wrap_inline545 to the authentication server. ( tex2html_wrap_inline551 is known only to A and the authentication server.)

  2. The AS decrypts tex2html_wrap_inline545 and constructs a new message consisting of: the original message Message, the current date D, and A's address A.

  3. The AS then returns tex2html_wrap_inline579 to A. (That is, it encrypts the message using its secret key tex2html_wrap_inline581 .)

  4. A sends tex2html_wrap_inline579 to B.

  5. B sends tex2html_wrap_inline579 to the AS, requesting tex2html_wrap_inline593 in return.

Later, if A denies having sent the message, B produces signature tex2html_wrap_inline579 , which the authentication server verifies. As long as only the authentication server knows X, and is trusted not to forge messages from A, signatures cannot be forged.

User-Authentication

The problem of identifying users when they log in is called user authentication. Most authentication methods are based on:

  1. What the user knows (e.g., passwords).

  2. What the user has (e.g., a plastic card).

  3. Physical characteristics of the user (e.g., fingerprints).

Passwords

The most widespread form of authentication requires the user to enter a password. In the first implementation of Unix, a password file contained the cleartext password for each user. This approach had the following problems:

  1. There is no way to prevent the making of copies by privileged users. Do you trust computer operators with your secret files?

  2. Software (or human) errors could cause the contents of the file to become available to others. This happened at MIT once; the ``message of the day'' file, whose contents is printed when a user logs in, was exchanged with the password file (through a software error). For several minutes, users logging in saw everyone's password.

  3. The contents of the password file is saved on backup tapes. Anyone with physical access to the tapes can extract the passwords.

Another possible approach is use the password as a key that encrypts a constant. The encrypted value is then stored in a file.

How do we verify passwords now? When a user logs in, she provides a password that is used as the key to encrypt the constant. The encrypted value is then compared with the encrypted version in the password file. If the two match, the log in succeeds.

Of course, the encryption function should by a one-way (trap-door) function so that it is hard to invert, allowing the password file to be read by any programs. Now the encrypted passwords no longer even need not be protected. Anyone can read them.

Attacks on Encrypted Passwords

How well do brute-force attacks work? One approach to penetrating this scheme is to guess a key, encrypt it, and compare the result with the encrypted password, repeating the process until successful. It turns out that guessing keys works because many people use poorly chosen passwords.

In one study of 3,289 passwords:

On a PDP-11/70, the system uses 1.25 milliseconds to encrypt a potential password. The time required to try combinations of characters is as follows:

Obviously, it pays to chose passwords carefully! Do we need to try all combinations of letters to guess a password? The search can be further speeded up by first trying:

  1. The 250,000 words in a dictionary (spelled forwards and backwards).

  2. A list of first names, last names, street names, and city names.

  3. All valid license plates in the state.

  4. Room numbers, social security numbers, telephone numbers, etc.

Morris and Thompson, authors of the Unix password scheme, compiled a list of likely passwords using the above heuristics, encrypted each of them using the known encryption algorithm, and compared them with a list of encrypted password available at their site. Result: Over 86% of all passwords turned up in their list!

Unix now uses the DES encryption algorithm to encrypt passwords. The current algorithm is:

  1. The first eight characters of a password are used to encrypt the constant 0. Note: longer passwords are truncated!

  2. The DES algorithm is iterated 25 times and the resulting 64 bits are repacked to become a string of 11 printable characters.

In addition, the passwd program that users invoked to change a password was changed to urge the use of harder passwords.

Salted Passwords

Consider an intruder attempting to gain access to as many accounts on as many systems as possible. For each encrypted password he precomputes (from a list of good candidates of course), he can check the entries for all users. That is,

build a table that maps cleartext passwords into their encrypted form. Once built, the same table can be used to map encrypted passwords back into cleartext!

The technique of salted passwords renders such attacks useless. Unix modifies the previous algorithm as follows:

  1. When a new password is being entered, the password program obtains a 12-bit random number (by reading the real-time clock) and appends it to the password entered by the user. This 12-bit number is called a salt.

  2. The concatenated string of the 12-bit salt and the first eight characters of the password are used as a key to encrypt the constant 0.

  3. Both the encrypted password and the 12-bit salt are stored in the password file.

  4. How are passwords validated? When the user subsequently attempts to log in, the 12-bit salt is extracted from the file and combined with the typed password for use as a key. The password is used as a key to encrypt the constant 0, and the result is compared with that stored in the password file.

Benefits? An intruder can no longer build a table containing a one-to-one mapping of encrypted passwords back into their cleartext. Each password has tex2html_wrap_inline607 possible encryptions! Precomputing the table would require a lot of storage!

Other Methods

In password protected systems, the main idea is that authorized users have a certain piece of information that they present to the system. A generalization of this idea is to have the computer keep a large amount of information that only an authorized user knows. Rather than ask for a password, the system can ask the user a series of questions:

At login time, the system asks a series of (random) questions that the user is expected to know. Because the set of questions changes for each login attempt, an intruder can't gain access by looking over a user's shoulder when he enters his password.

Computer Authentication

Finally, the use of a sophisticated password protection scheme has its limits.

Should the computer authenticate itself to the user? The following example shows why it may be important to identify the computer to the user:

One clever method for obtaining passwords uses the following strategy: A user leaves running on the terminal a program that simulates the login behavior of the system: it prompts the user for a login name and password. When the trusting user enters the two items, the program writes the information in a file (accessible to the program's owner), prints the message ``Invalid Login'', and executes the standard system user authentication program. The user, assuming that he mistyped his password, retypes his password, logs in, and never realizes what happened.

More recently, a ``bogus'' ATM machine was placed in a shopping mall. Customers would insert their cards, enter their PIN, and the machine would spit out the card saying the network was down. Customers assumed the machine was broken and walked away. In fact, the ATM machine collected their PIN and card numbers in order to get into their accounts. Over $50,000 was withdrawn from accounts before the fraud was discovered.