short history of ciphers (cont’d)

38
Data Security Short History of Ciphers (cont’d) • Dark Ages -> in Europe really dark, while Arab scholars invented cryptanalysis • Al-Kindi (ninth-century) -> Arab polymath and author of 290 books -> A Manuscript on Deciphering Cryptographic Messages • Roger Bacon (13 th century) -> first European book about cryptography -> Epistle on the Secret Works of Art and the Nullity of Magic • Renaissance in the West (14 th - 16 th century) -> Europeans are back! -> cryptography is becoming popular again (routine diplomatic tool) -> and with it cryptanalysis -> suddenly, monoalphabetic ciphers aren’t all that secure anymore!

Upload: others

Post on 07-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Data Security

Short History of Ciphers (cont’d)

• Dark Ages-> in Europe really dark, while Arab scholars invented cryptanalysis

• Al-Kindi (ninth-century)-> Arab polymath and author of 290 books-> A Manuscript on Deciphering Cryptographic Messages

• Roger Bacon (13th century)-> first European book about cryptography-> Epistle on the Secret Works of Art and the Nullity of Magic

• Renaissance in the West (14th - 16th century)-> Europeans are back!-> cryptography is becoming popular again (routine diplomatic tool)-> and with it cryptanalysis-> suddenly, monoalphabetic ciphers aren’t all that secure anymore!

Data Security

Language Characteristics

• every letter of a language has a certain characteristic-> letter frequency-> contact with other letters-> position within words

• in English e is by far the most common letter, then T, A, O

• other letters are fairly rare, such as X, J, Q, Z

• then look at digrams (TH, HE, AN) and trigrams (THE, AND)

Data Security

• May also be useful to consider sequences of two or three consecutive letters called digrams and trigrams, respectively.

• e.g. common diagrams (in decreasing order): TH, HE, IN, ER, AN, RE, ED, ON, ES, ST, EN, AT, TO, NT, HA, ND, OU, EA, NG, AS, OR, …

• e.g. common trigrams (in decreasing order): THE, ING, AND, HER, ERE, ENT, THA, NTH, WAS, …

• Have tables of frequencies for letters, digrams, trigrams, contact data

Language Characteristics (cont’d)

letter probability letter probability

A .082 N .067

B .015 O .075

C .028 P .019

D .043 Q .001

E .127 R .060

F .022 S .063

G .020 T .091

H .061 U .028

I .070 V .010

J .002 W .023

K .008 X .001

L .040 Y .020

M .024 Z .001

Data Security

English Letter Frequencies

Data Security

Use in Cryptanalysis

Key Concept-> monoalphabetic substitution ciphers do not change relative

letter frequencies-> calculate letter frequencies for ciphertext-> compare counts/plots against known values

For Caesar cipher-> look for common peaks/troughs-> peaks at: A-E-I triple, RST triple-> troughs: at JK, XYZ

For general monoalphabetic cipher-> must identify each letter

Data Security

Example Cryptanalysis

• given ciphertext:UZQSOVUOHXMOPVGPOZPEVSGZWSZOPFPESXUDBMETSXAIZVUEPHZHMDZSHZOWSFPAPPDTSVPQUZWYMXUZUHSXEPYEPOPDZSZUFPOMBZWPFUPZHMDJUDTMOHMQ

• count relative letter frequencies (see text)• guess P & Z are e and t• guess ZW is th and hence ZWP is the• proceeding with trial and error finally get:

it was disclosed yesterday that several informal butdirect contacts have been made with politicalrepresentatives of the vietcong in moscow

Data Security

Improvement of Substitution Ciphers

Dilute letter frequency-> represent plaintext letters by several cipher symbols-> cipher symbols all have equal frequency-> homophonic substitution ciphers

Use multiple cipher alphabets-> will explain this in just a minute-> polyalphabetic substitution ciphers

Encrypt multiple plaintext letters -> will talk about this in just two minutes-> polygram substitution ciphers

Data Security

Relative Frequency of Occurrence of Letters

01 2 3 4 5 6 1 7 8 9 10 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Plaintext

Playfair

Vignere

Random polyalphabetic

Frequency ranked letters (decreasing frequency)

Nor

mal

ized

rela

tive

freq

uenc

y

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure 3.6 Relative Frequency of Occurrence of Letters

Data Security

Short History of Ciphers (cont’d)

• February 8, 1587-> Mary Queen of Scots beheaded (so to say, by Queen Elizabeth I.)-> Why? Her nomenclator (cipher/code) was not secure enough!-> that’s what you get when you deal with double agents and

counter-intelligence agencies (see Babington, Gifford, Walsingham)

• Blaise de Vigenère-> published Traicté des Chiffres (“A Treatise on Secret Writing”) in 1586-> Mary Queen of Scots should have read this -> greatest cipher for its time, neglected for two centuries-> later (in the 1800s) called “le chiffre indéchiffrable”

Data Security

Blaise de Vigenère

Data Security

Polyalphabetic Ciphers

• use multiple cipher alphabets

• called polyalphabetic substitution ciphers

• makes cryptanalysis harder with more alphabets to guess and flatter frequency distribution

• use a key to select which alphabet is used for each letter of the message

• use each alphabet in turn

• repeat from start after end of key is reached

Data Security

Vigenère Cipher

• simplest polyalphabetic substitution cipher

• effectively multiple Caesar ciphers

• key is multiple letters long K = k1 k2 ... kd

• ith letter specifies ith alphabet to be used

• use each alphabet in turn

• repeat from start after d letters in message

• decryption simply works in reverse

Data Security

Vigenère Tableau

Data Security

Example

• write the plaintext out

• write the keyword repeated above it

• use each key letter as a caesar cipher key

• encrypt the corresponding plaintext letter

• e.g., using keyword deceptivekey: deceptivedeceptivedeceptive

plaintext: wearediscoveredsaveyourself

ciphertext:ZICVTWQNGRZGVTWAVZHCQYGLMGJ

Data Security

Security of Vigenère Ciphers

• have multiple ciphertext letters for each plaintext letter

• hence letter frequencies are obscured-> but not totally lost!-> do you see where we’re heading towards...?!!

• Number of possible keywords of length m = 26m.– Much larger than that of a simple substitution cipher.– An alphabetic character of a plaintext can be mapped to one of

m possible alphabetic characters (assuming that the keyword contains m distinct characters).

– In general, cryptanalysis is much more difficult for polyalphabetic than for monoalphabetic cryptosystems.

• start with letter frequencies– see if look monoalphabetic or not– if not, then need to determine number of alphabets, since then

can attach each

Data Security

Short History of Ciphers (cont’d)

• Thomas Jefferson’s Wheel Cipher (around 1800)-> 26 stacked cylinders, each showing alphabet in random order-> one row spells plaintext, any other is used as ciphertext-> reinvented by French (~1890) and US government (~1914, M-94)

• Charles Babbage-> Difference Engine No. 1 and No. 2 in 1820-40s-> cracked the Vigenère cipher some time around 1854-> never publicized (did British Intelligence keep him from doing so?)

• Friedrich Wilhelm Kasiski-> Die Geheimschriften und die Dechiffrierkunst (1863)-> method to break Vigenère cipher-> became known as the “Kasiski test”

Data Security

Charles Babbage

Data Security

Example of Vigenère Cipher

Data Security

Kasiski Test

• method developed by Babbage / Kasiski (1863)

• repetitions in ciphertext give clues to period

• so find same plaintext an exact period apart, which results in the same ciphertext (of course, could also be random fluke)

• see repeated “VTW” in previous example

• suggests key size of 3 or 9

• then attack each monoalphabetic cipher individually using same techniques as before

• The Zimmermann Telegram in WWI

Data Security

Kasiski Test (More Formal)

• Idea: any two identical strings will be encrypted to the same ciphertext if they are km positions apart where m is the keyword length and k is a positive integer.

• Two Steps of Attack:• Find the keyword length• Conduct statistical attack

• An Example of Attack• Look for trigrams that are identical• Compute the distance between them, d1, d2, …• Let m’ be a divisor of gcd(d1, d2, …)• Write the ciphertext in a rectangular array with m’ columns, then

statistical attack can be used on each column

Data Security

Low-Frequency Analysis

• works for English language plaintext

• after identifying the key length-> divide ciphertext into individual segments

• each segment is listed in a column and each column is shifted 25 times by one (have 25 columns per segment)

• one of the columns (per segment) contains plaintext letters-> however: cannot identify words because each segment

contains only part of the original plaintext

• now determine the five letters with lowest frequency-> in plaintext these are “j”,”k”,”q”,”x”,”z” with a total frequency of about 2%

• whichever column contains these letters with the appropriate total frequency should be the plaintext -> get one shift value per segment and from that the keyword

Data Security

Decipher this Vigenère Cipher

kbxzoeqecalrvbwlvvczthrzpnxumkgvjtfvtwudsinlhzmdrtniitcchlxygztpwrpqcmmekbthqfwvivjjrirljftbwlqyqetciipwilzvtgduifhbwlqzuqcoeskbtkxygztmsigbwlvvochafvcnxumkgvjtfvtwuprycjxaiuywgshjcvnmmekbtuyddmgkmmkltkfpkvuprzvgxzejpmpyxfpwiomeiihtebgacvsufahvxygiklvrimevtlniipseqnpspkjmeseegbhprkjmjummgzhlgrpjtzezfbdiiqgzdmvfobwpwzvndspfyaioekvptwsgwtpamfpwualvypdsilpqklvjgqhhpjqhtysrplioekcvnwifrttfslointivvngvqkkutaskkuthvvomglppvptwvffcrawfhislvrpotkmdcoxuekkwcalvtmxzekjmdycnjqrowkcbtzxycbxmimgzpucfpmspwtqdtywnjiialvwvxciiumxzjftickayaqipwygztpxnktaprjvicappfqhhtggighrudmgltccktkfpuwblxykvvlzvpudyiskhpyvvcvsprvzxapgrdttalvtmxzeeqbwlvnjqrowkcbtzxycbiomjjihhpigisflrrxtuiucvnalzpoioekjiewieuppwtvpapuckjqcnxycbxulrrxtumeikpbwvuadtikjqcnicumivlrrxtugrwatzwfomiomeimazikqppwtvpicfxykvvalrvqcoegrmcprxeijzijkbhlpwvwwhtggvpnezpkpbwvuqizichbdoegrmchkrkvxahfgacaecnvtjijuigpppjiewieepgvrfnwvpgrntnalfwow

Data Security

Autokey Cipher

• ideally want a key as long as the message• Vigenère proposed the autokey cipher • use keyword once, then follow with plaintext

• knowing keyword can recover the first few letters • use these in turn on the rest of the message

• but still have frequency characteristics to attack • Both the key and plaintext share the same frequency

distribution of letters è apply a statistical technique• eg. given key deceptive

key: deceptivewearediscoveredsav

plaintext: wearediscoveredsaveyourself

ciphertext: ZICVTWQNGKZEIIGASXSTSLVVWLA

Data Security

Vernam Cipher

• proposed by Gilbert Vernam (1918)

• consider binary data (bits)

• use XOR (exclusive-or) operation with very long key(key is repeated if necessary)

ci = pi Å ki and pi = ci Å ki

pi is ith binary digit of plaintextki is ith binary digit of keyci is ith binary digit of chiphertextÅ is XOR operation

Data Security

One-Time Pad

• if a truly random key as long as the message is used, the cipher will be secure

• called a One-Time pad

• is unbreakable since ciphertext bears no statistical relationship to the plaintext

• for any plaintext & any ciphertext there exists a key, which maps one to the other

• can only use the key once though

• have problem of secure distribution of key

Data Security

Unbreakable cipher

• For the same ciphertext, two keys can generate two plausible plaintexts

• Which plaintext is correct?

• Unbreakable

Data Security

Weakness of One-time Pad• Malleable: Provides secrecy but not authentication.• Keys must NOT be reused:

– cannot withstand Known Plaintext Attack– depending on known information about plaintexts, Eve can make use of

C1 Å C2 = (M1 Å K) Å (M2 Å K)= M1 Å M2

to figure out both messages

• In practice– Generate a large number of random bits, – Exchange the key material securely between the users before

sending a one-time enciphered message, – Keep both copies of the key material for each message

securely until they are used, and – Securely dispose of the key material after use, thereby

ensuring the key material is never reused.

It requires a perfect random numbers as key• Generating random bits

– radioactive decay– noisy diode– flipping coins

Data Security

Random numbers needed

• If the key material is generated by a deterministic program then it is not actually random – should never be used in a one-time pad cipher. – If so used, the method becomes a stream cipher; these

usually employ a short key that is used to generate a long pseudorandom stream, which is then combined with the message using some such mechanism as those used in one-time pads. Stream ciphers can be secure in practice, but they cannot be absolutely secure in the same provable sense as the one-time pad

Data Security

Stream ciphers

• Stream ciphers– The most famous: Vernam cipher – Invented by Vernam, ( AT&T, in 1917) – Process the message bit by bit (as a stream) – different from the one-time pad– some call same– Simply add (XOR) bits of message to random key bits – For decryption, generate the key stream and XOR with the

Ciphertext• Examples

– A well-known stream cipher is RC4; – others include: A5/1, A5/2, Chameleon, FISH, Helix. ISAAC,

Panama, Pike, SEAL, SOBER, SOBER-128 and WAKE.• Usage

– Stream ciphers are used in applications where plaintext comes in quantities of unknowable length - for example, a secure wireless connection

Data Security

Simplest Stream Cipher

Plaintext

Key

Ciphertext Ciphertext

Key

Plaintext

Data Security

Pros and Cons

• Drawbacks– Need as many key bits as message, difficult in practice – (ie distribute on a mag-tape or CDROM)

• Strength– If unconditionally secure is provided if the key is truly

random?

Data Security

Key Generation

• Why not to generate keystream from a smaller (base) key?– Use some pseudo-random function to do this – Although this looks very attractive, it proves to be

very very difficult in practice to find a good pseudo-random function that is cryptographically strong

• This is still an area of much research

Data Security

Short History of Ciphers (cont’d)

• Ciphers are really becoming something “cool” (19th century)-> used to convey secret (love) messages in newspapers-> Edgar Alan Poe, Jules Verne, Sir Arthur Conan Doyle-> Beale Papers containing directions to treasure buried in VA

• Charles Wheatstone (1854)-> invents Playfair Cipher-> named after his friend Baron Playfair-> polygram substitution cipher

• Lester S. Hill (1929)-> general polygram cipher, not only two letters but m-> matrix cipher

Data Security

Polygram Substitution Cipher

• not even the large number of keys in a monoalphabetic cipher provides security

• one approach to improving security was to encrypt multiple letters (polygraphic cipher)

• the Playfair Cipher is an example for polygram cipher• invented by Charles Wheatstone in 1854, but named after

his friend Baron Playfair

Data Security

Playfair Key Matrix

• a 5 x 5 matrix of letters based on a keyword

• I and J are considered the same letter

• fill in letters of keyword (without duplicates)

• fill rest of matrix with other letters

• e.g., using the keyword MONARCHYM O N A RC H Y B D

E F G I K

L P Q S T

U V W X Z

s i/j m p l

e a b c d

f g h k n

o q r t u

v w x y z

Data Security

Encryption and Decryption

Encrypt two plaintext letters at a time:1. if a pair is a repeated letter, insert a filler like 'X',

e.g., "balloon" encrypts as "ba lx lo on"

2. if both letters fall in the same row, replace each with letter to right (wrapping back to start from end), e.g., “ar" encrypts as "RM"

3. if both letters fall in the same column, replace each with the letter below it (again wrapping to top from bottom), e.g., “mu" encrypts to "CM"

4. otherwise, each letter is replaced by the one that lies in its row and is located in the column of the other letter, e.g., “hs" --> "BP", “ea" --> "IM" or "JM" (as desired)

Data Security

Security of Playfair Cipher

• security much improved over monoalphabetic

• since we have 26 x 26 = 676 digrams

• would need a 676 entry frequency table to analyze (vs. 26 for monoalphabetic cipher)

• therefore, need correspondingly more ciphertext

• was widely used for many years (eg. US & British military in WWI)

• it can be broken, given a few hundred letters

• since it still has much of the plaintext structure

• Difficult using frequency analysis– But it still reveals the frequency information

Data Security

Relative Frequency of Occurrence of Letters

01 2 3 4 5 6 1 7 8 9 10 10 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Plaintext

Playfair

Vignere

Random polyalphabetic

Frequency ranked letters (decreasing frequency)

Nor

mal

ized

rela

tive

freq

uenc

y

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Figure 3.6 Relative Frequency of Occurrence of Letters