Objectives To Do List Session Notes Review Questions

Session Objectives

  1. Learn about Claude Shannon, and how his work contributed to cryptology.
  2. Understand the concept of entropy.
  3. Learn how substitution and transposition are used together to encrypt plaintext.
  4. Study the implementation of the Digital Encryption Standard (DES) and the Rijndael algorithm.
  5. Learn why one-time pads lead to unbreakable ciphers.
  6. Understand the reason for learning cryptanalysis.
  7. See how crypto machines operate.

To Do List

Tasks

  • Read the session notes (below), including any embedded URL links.
  • Participate in the Conference discussion topic, which will require completion of a cryptography exercise and a response to at least one of your classmates.

Readings

  • Pfleeger, Security in Computing, Chapter 2

Session Notes

Preliminaries

This week we will back up in our textbook to Chapter 2, Elementary Cryptography. We'll start by looking at cryptography from its theoretical basis, which was developed by Claude Shannon over 50 years ago. After examining the theory of entropy, we'll look at how encryption is accomplished using the techniques of substitution and transposition. After introducing those concepts, we'll see how they are implemented in the DES, 3DES, and AES encryption algorithms. After coming to an appreciation of the complexity of those and similar encryption algorithms, we'll be in a position to understand the requirement for machines (both mechanical and electronic) to facilitate the encryption process.

I am indebted to Dr. David Madison, former Program Chair at UMUC (and still an adjunct faculty member teaching INFA640, Cryptology and Data Protection), for much of the technical historical background on encryption found in these Session Notes.

Entropy 101

The intent of cryptography is to provide secrecy to messages and data. As articulated by Claude Shannon, there are three general types of secrecy systems:

  • Concealment systems ("invisible" ink, for example, as well as the techniques of steganography)
  • Privacy systems (for example, a virtual private network (VPN), where access is limited to those who have a need to know)
  • 'True' secrecy systems (i.e., ciphers)

Claude Shannon is often called the father of information theory. His seminal work, A Mathematical Theory of Communication, was first published in two parts in the July and October, 1948, editions of the Bell System Technical Journal [Shannon, C.E. (1948, July). A Mathematical Theory of Communication. Bell System Technical Journal, 27, 379-423.] and [Shannon, C.E. (1948, October). A Mathematical Theory of Communication. Bell System Technical Journal, 27, 623-656.]. The paper has appeared in a number of republications since:

  • The original 1948 version was reproduced in the collection Key Papers in the Development of Information Theory [Slepian, D. (Ed.) (1974). Key Papers in the Development of Information Theory. New York: IEEE Press.]. The paper also appears in Claude Elwood Shannon: Collected Papers [Sloane, N.J.A. & Wyner, A.D. (Eds.) (1993). Claude Elwood Shannon: Collected Papers, New York: IEEE Press.]. The text of the latter is a reproduction from the Bell Telephone System Technical Publications, a series of monographs by engineers and scientists of the Bell System published in the BSTJ and elsewhere. This version has correct section numbering (the BSTJ version has two sections numbered 21), and this is probably the only difference from the BSTJ version.
  • Prefaced by Warren Weaver's introduction, "Recent contributions to the mathematical theory of communication," the paper was included in The Mathematical Theory of Communication, published by the University of Illinois Press in 1949 [Weaver, W. & Shannon, C.E. (1949). The Mathematical Theory of Communication. Urbana, Illinois: University of Illinois Press.]. The text in this book differs from the original mainly in the following points:
    • The title is changed to "The Mathematical Theory of Communication" and some sections have new headings;
    • Appendix 4 is rewritten; and
    • The references to unpublished material have been updated to refer to the published material.

It is apparent from the wide distribution of Shannon's paper that it has become a modern-day classic. For our class, we'll be using the version based on the BSTJ version with a number of corrections; please find this classic paper at Shannon.

Shannon was the father of information theory in the sense that he demonstrated mathematical methods of treating communication channels, bandwidth, and the effects of random noise on signals. One of his major contributions was to introduce the notion of entropy of a signal. The formula for calculating the entropy of a signal is given in Theorem 2 of Shannon's paper (top of p.11):

H = -K [Sum (i=1..n)] pi log pi
Where H is the entropy, K is a positive constant (we'll assume it's equal to 1), pi is the probability of a given message (or piece of information), and n is the number of possible messages (or pieces of information). Note that the logarithm is taken with the base 2, e.g., log21024 = 10.

Let's examine this formula through the use of three particular examples:

  • Example 1: Suppose there is only one possible signal (i.e., n = 1, and p1 = 1). Under these conditions,
  • H = -K x 1 x log 1
    however, log 1 = 0, resulting in H = 0. In words: in this situation, there is only one possible message that has a probability of 1. Since there is no uncertainty, the entropy in this case is zero.
  • Example 2: There are only two possible, equally probable, messages. For this situation,
  • H = -K(0.5 log (0.5) + 0.5 log(0.5)) = -K ( 0.5(-1)+0.5 (-1))
    If the constant K= 1, then H = 1. In words: in this example, there are two possible equally probable messages, and the uncertainty (entropy) is 1 bit (one bit can specify two possible conditions, i.e., 0 or 1)
  • Example 3: There are 1024 (= 210) possible signals, all of equal probability (pi = 2-10). For this situation,
  • H = -K(2+10 x 2-10 log(2-10)) = 10
    In words: in this example, there are 1024 equally probably possible messages, and the uncertainty (entropy) is 10 bits.

We can draw some general conclusions from this:

  • When there is little doubt about what signal (or message) has been sent, the entropy is low. If there is no uncertainty (we're sure what a message is), the entropy in zero.
  • As the number of possible messages becomes large, and if the probabilities are equal, then the entropy (uncertainty) becomes high.

How does all of this relate to cryptography? Consider a simple plaintext message: This class is INFA610. This message is certainly true; hence, its entropy is zero (there's no uncertainty about it). Now suppose we encrypt this message using one of 1024 possible, and equally probable, keys, producing ciphertext such as

KCSAQ EKGZI DGGQR DDTXP KGBFG AGHDH
Since someone without the key does not know which key has been used, the entropy of the message has been increased from zero to 1024. We have increased the uncertainty in the message for those who do not know the key.

Entropy, as defined by Shannon, gives an indication of the complexity, or randomness, of a message or a data set. Generally, signals or data sets with high entropy

  • Have a greater chance of a data transmission error
  • Require greater bandwidth to transmit
  • Have smaller capacity for compression
  • Appear to have a greater degree of "disorder" (randomness).

It should be apparent that the English language (and most other human languages) have a relatively low entropy due to the frequency of certain characters, e.g., the letters 'e' and 't'. Images have similar characteristics: the value of a given pixel is highly correlated to neighboring pixels. Both types of information (English text and images) can be readily compressed using algorithms such as ZIP, LZW, or JPEG. These algorithms "squeeze out" the redundancies in a message, making the compressed version much smaller, and consequently, much more random. (If a compressed message still had inherent order or redundancy, it could be further compressed. As you may know, compressing a file twice doesn't help to reduce the size.) Hence, compressing data or images increases the entropy. The increased entropy (which equates to increased randomness) makes the compressed data or image more difficult to decode. This is exactly what we're looking for in a good cipher.

Claude Shannon worked on classified projects during World War II, and realized that his work in communication theory was directly applicable to cryptology. He prepared a classified report (now declassified), Communication Theory of Secrecy Systems.

One of the basic concepts presented in this paper is that of a priori ("known beforehand") probability of possible message keys. Consider a simple English alphabet substitution code where there are 26! (4.03x1026) possible substitutions. Having a sample encrypted message, due to the frequency of various alphabet characters, can result in an a posteriori ("after the fact") probability that is near 1 (i.e., nearly a certainty) as to what key was used in the encryption.

As mentioned in Shannon's paper, with a perfect cipher "all keys are essentially equivalent – they all lead to the same set of a posteriori probabilities;" that is, having an encrypted sample won't help the cryptanalyst do his or her job. Another important point is that an encrypted message is similar to a signal that is buried in noise; the higher the noise level, the more difficult it is to extract the message. This is the important concept to take away from Shannon's paper: a good cipher will make a message look like noise. In terms of implementation, algorithms used to encrypt messages should "scramble" the original message to the maximum possible extent. One way to do this is to develop algorithms that take a message through a sequence of substitutions and transpositions. Or, to put it in Shannon's terms, encrypting a message will intentionally increase the message's entropy. Next, we'll look at specific ways to achieve that objective.

Putting Theory to Practice

So, the idea is: randomize the plaintext message to the max! We have two simple approaches for doing that: substitution and transposition. Let's start with substitution.

Since the English language has 26 characters, it follows that a simple English language substitution has 26! possible variations [you should make sure you understand why this is true]. Rather than simply rotate the alphabet one time (for instance, by shifting three letters to the right – A to D, B to E, ... Q to T, etc.), let's change the value of the shift for every new character in the message.

To do that, we'll use a Magic Encoder/Decoder Ring. In order to thoroughly understand what's going on with this simple substitution code, I would encourage you to actually cut the rings out per the instructions. (As a construction hint, it turns out the the rings fit nicely in the bottom of a Starbucks "Tall" (i.e., small) upside-down coffee cup. Use a thumbtack to hold the rings in place.)

Try walking through the example presented in the link above to encrypt the plaintext message:

INFORMATIONSECURITYISIMPORTANT
using the key "CRYPTO" to get the ciphertext
KEDDKACKGDGGSGGTBHAZQXFDQIRPGH



Consider the following points:

  • This algorithm does a pretty good job of "shuffling the deck" by changing the rotation for every character of the message;
  • Clearly, longer keys are better than shorter keys, since the rotation repeats once per key length;
  • Using a real word for the key is dangerous – a cipher-cracking tool could easily be built to cycle through dictionary words;
  • Although this algorithm is pretty good, it's not good enough. Consider, for example, the frequency of characters both in the key and in the plaintext message, which a cryptanalyst could use to advantage.

All in all, though, it's not a bad start. The encrypted message would have been a challenge in pre-computer days.

Let's now shuffle the deck again, this time using a transposition. We'll (arbitrarily) arrange the after-substitution ciphertext from the encoder ring as a six-column, 5-row array:

              KEDDKA
              CKGDGG
              SGGTBH
              AZQXFD
              QIRPGH

Reading the text out by column yields:

                  KCSAQ EKGZI DGGQR DDTXP KGBFG AGHDH

Now we're getting to the point where we can give a cryptanalyst a run for his or her money. The analyst will need to know the key used for the substitution as well as the key used for the transposition (i.e., the fact that we used a 6x5 array).

As you will learn from this session's reading in Pfleeger, the techniques of substitution and transposition are used in the Data Encryption Standard (DES) algorithm, developed in the 1970s at the U.S. National Bureau of Standards (NBS, known at the National Institute of Standards and Technology (NIST) since 1988). As shown in Figure 2-8 of Pfleeger, a substitution is followed by a transposition (called a permutation in Pfleeger), and the whole process is repeated 16 times.

DES uses a 56-bit key, so there are 256 = 7.2 x 1016 possible keys. A cryptanalyst, using a brute-force approach to crack a DES cryptogram, might try looping through all possible keys. In the 1970s, 7.2 x 1016 possible keys was a good deterrent, given the computer power available in those days. With the computing power available today, DES presents less of a challenge. It would not be unreasonable for a modern computer to try 109 keys per second. At that rate, cracking a DES cryptogram would take 7.2 x 107 seconds, which is about 833 days, or 2.3 years. A cluster of computers, or a supercomputer, could reduce this time by several orders of magnitude. (As noted in Pfleeger, a DES-cracking machine built in 1998 for $100k took about four days to find the key.)

Recognizing the vulnerability of DES, one might expect that DES might be made uncrackable by running DES-encrypted ciphertext through the DES algorithm a second time, to square the complexity. As it turns out, this strategy only doubles the complexity, making the key length effectively 57 bits rather than 56. As described in Pfleeger, a triple-DES algorithm has been developed that provides an effective 112-bit key length, which is roughly 5.2 x 1033 possible keys, affording plenty of protection for known attacks, and really increasing the entropy. Variations of triple-DES are defined that involve the use of one, two, or three independent keys.

Pfleeger also describes an Advanced Encryption Standard (AES), the Rijndael algorithm, that is capable of 256-bit (or more) keys. It has many of the attributes of the "perfect" cipher in that it is an open design, yet maximizes the entropy of plaintext to closely approximate pure noise.

A couple of other points about the ciphers that we've been talking about up to now. First, the ciphers that we've been looking at are symmetric ciphers; that is, the same key that is used to encipher a message is used to decipher the ciphertext. In the next session we'll look at asymmetric ciphers, which use different keys to encipher and decipher. The second point is that the examples that we've looked at use uniliteral substitutions, that is, each plaintext character gets transformed to a single ciphertext character. There is no reason why a single plaintext character could not be transformed into two ciphertext characters (this would be a biliteral substitution), three ciphertext characters (this would be a triliteral substitution), or even more. In general, transforming a single ciphertext character into more than one ciphertext character is a multiliteral substitution. Such multiliteral substitutions are just one more strategy for adding randomness to ciphers.

One-Time Pads

We've looked at the theory of using a complex algorithm to transform plaintext into a noise-like cipher. Even the most clever algorithm is, in the end, deterministic, i.e., it consists of a precise way of shuffling the deck which is not really random (it's often called pseudo-random). Although huge keys (say 256 bits) produce a cipher that is for all intents noise-like, in principle, such a cipher still can be cracked. There is a solution to this problem, albeit an unattractive solution. We have seen with the substitution code "Magic Decoder Ring" that the finite length of the key ("CRYPTO" in the example above) is a problem. This can be resolved by having a key of unlimited length, using a different substitution for each plaintext character. If the key pad is truly random, then the encrypted message will be truly noiselike, and therefore uncrackable. It is easy to generate such a keypad; for example, the characters in the key could be generated by spinning a "wheel of fortune." Since such a key is truly random, it can't be replicated by the recipient of the message, so the key must be distributed. This is the drawback of one-time pads: they must be distributed. Although possible in some circumstances, distribution of pads can be difficult in some situations, and one must always be concerned with the possibility that the pad has been intercepted in transit. It also should be mentioned that key distribution, even for a finite key such as "CRYPTO" can still be problematic, and is one of the issues addressed by public key encryption, which we will examine in much greater detail in the next session.

The Reason for Learning Cryptanalysis

There is a good reason for learning cipher-cracking techniques: a thorough understanding of how ciphers are created and broken aids a cryptanalyst in developing more secure ciphers. Lacking experience in breaking ciphers leaves an analyst without firsthand knowledge of cipher vulnerabilities. For an overview of how cryptanalysis is learned, please read the tutorial-style article by Schneier.

The Need for Cipher Machines

By now it should be apparent that transforming plaintext to ciphertext is a complex process. We have seen how a modern cipher, such as DES, requires multiple iterations of substitutions and transpositions. Clearly, algorithms such as this are simply too complex for manual implementation, on either the encryption or decryption end. A device or machine of some sort is required to implement crypto algorithms.

There are a number of classic crypto machines, including:

  • the PURPLE machine;
  • Enigma;
  • Swiss NEMA; and
  • Hagelin machines.

For detailed descriptions and software simulations of these machines and others, please refer to Crypto Machines. You will have an opportunity to experiment with one of these machines and report your findings in a Conference discussion question for this session.


Review Questions

  1. Who was Claude Shannon, and how did he contribute to cryptography?
  2. How does entropy relate to the degree of uncertainty?
  3. Why is it desirable for ciphertext to appear "noiselike?"
  4. Why are character substitution and transposition used in many crypto algorithms?
  5. What are the basic steps in the DES algorithm?
  6. Why has it been necessary to move beyond DES?
  7. What makes a one-time pad the "perfect" cipher algorithm? What is its major deficiency?
  8. How would someone learn to break ciphers?

(These questions are intended to be a self-test of your comprehension of this session's material; answers to these questions do not need to be turned in.)


Summary

In this session we have looked at the basics of encryption. Starting with the history of ciphers, including a look at how ciphers are categorized, we then looked at two specific categories, substitution and transposition ciphers, and saw their vulnerabilities to cryptanalysis.

Preview of Next Session

In Session 9, we will continue studying cryptography, turning our attention from symmetric ciphers to
asymmetric (public key) ciphers.