10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •once we have letter...
TRANSCRIPT
Substitution cipher
2
Substitution cipher
• Substitution cipher is a cipher where every plaintext character is replaced for a different ciphertext character
• It differs from the Caesar cipher in that the cipher alphabet is not simply the alphabet shifted, it is completely shuffled
• The mapping must be one-to-one
3
Create cipher
alphabet = "abcdefghijklmnopqrstuvwxyz"
import random
def create_cipher(alphabet):
shuffled = random.sample(alphabet,26) #create list from alphabet in random order
shuffled = "".join(shuffled) # convert list to str,NO NEED TO UNDERSTAND THIS
encrypt_dict = {} #an empty dictionaryfor i in range(len(original)):
encrypt_dict[original[i]] = shuffled[i]
return encrypt_dict
4
Demo online…..
encrypt
5
• Once we have the cipher, we can easily encrypt
• For every character in text:• If in the alphabet:
• Replace with the value from the cipher
• Else:• Leave untouched.
encrypt
6
• Once we have the cipher, we can easily encrypt
def encrypt(text, enc_dict):
""" encrypts text using enc_dict as substitution cipher
"""
cipher = ""
for char in text:
if char not in enc_dict:
cipher = cipher + char #characters not in
#enc_dict are untouched
else:
cipher = cipher + enc_dict[char]
return cipher
decrypt
7
• Say we know the cipher, how to decrypt?
• Remember Caesar cipher:• To decrypt we simply re-encrypted from the other direction
• Same here:• Reverse the cipher dictionary, keys will be values and vice versa.
def reverse_dict(d1):
""" reverse keys <--> values in dictionary d1 """
d2 = {} #an empty dictionary
for key in d1:
value = d1[key]
d2[value] = key #insert value:key into d2
return d2
decrypt
8
• Now to decrypt we just encrypt with the reversed dictionary
def decrypt(cipher, enc_dict):
""" decrypts cipher that was encrypted using enc_dict """
dec_dict = create_reverse_cipher(enc_dict)
return encrypt(cipher, dec_dict)
decrypt
9
• What can we do if we don’t know the cipher?
• Can we guess it?
Guessing the cipher
• There are 26! Options• More than 10^26….
• But ‘English is English’
• Letters \ groups of letters \ words frequencies• https://en.wikipedia.org/wiki/Frequency_analysis
• "is", "the", "are", "of", "with", "to“, “a”, “was”
10
Guessing the cipher
• Letters \ groups of letters \ words frequencies• https://en.wikipedia.org/wiki/Frequency_analysis
• "is", "the", "are", "of", "with", "to“, “a”, “was”
11
Most common letters:e, t, o, a
Breaking the cipher
• Modus Operandi:
1. Count appearances of chars is the text
2. Guess the most frequent letters based on prior knowledge
3. Repeatedly guess letters and words based on prior knowledge• One letter words are usually: ‘i’ or ‘a’
• Common words in English: ‘is’, ‘the’, ‘are’, ‘of’, ‘with’, ‘to’, ‘was’
12
1. Count appearances of chars is the text
13
def char_count(text):
''' counts number of appearances of characters in text.
Returns a sorted dictionary. '''
cnts = {}
for char in alphabet:
cnts[char] = 0
for char in text:
if char in alphabet:
cnts[char] = cnts[char] + 1
return cnts
def sort_by_count(cnts_dict):
''' sorts a given dictionary whose elements are char:count
sorting is done by counts, returns a list '''
return sorted(cnts_dict.items(), key = lambda x:-x[1])
No need to understand this
1. Count appearances of chars is the text
• We will count letter frequencies in our cipher.
• We will count letter frequencies in a representing text.
• We will compare the frequencies to guess letters.
• What texts can represent the letter frequencies in English?
14
2. Guess the most frequent letters based on prior knowledge
• What texts can represent the letter frequencies in English?
• Is “The quick brown fox jumps over the lazy dog” enough?
15
2. Guess the most frequent letters based on prior knowledge
• What texts can represent the letter frequencies in English?
• Is “The quick brown fox jumps over the lazy dog” enough?
• Let’s go with something longer….
16
2. Guess the most frequent letters based on prior knowledge• Tom Sawyer:
• http://www.gutenberg.org/files/74/74-0.txt
• Harry Potter and the Philosophers Stone:• https://archive.org/stream/Book5TheOrderOfThePhoenix/Book%201%20-
%20The%20Philosopher%27s%20Stone_djvu.txt
• The Hobbit:• https://archive.org/stream/TheHobbitByJ.R.RTolkien/The%20Hobbit%20by%2
0J.R.R%20Tolkien_djvu.txt
• ‘download’ function in the .py file. We will not cover it.
17
Exercise
• Once we have letter frequencies from a large text- can we simply guess the entire cipher according to it?
• The most frequent letter in the cipher will be the most frequent letter in the text and so on….
18
def create_dec_dic(what_dic, train_dic):
'''creates dictionary to decipher encrypted text'''
what = sort_by_count(what_dic)
train = sort_by_count(train_dic)
dec_dic={}
for i in range(len(train)):
dec_dic[what[i][0]]= train[i][0]
return dec_dic
Partial decryption
• That won’t work….
• Let’s write a function called try_decrypt• Arguments:
• text – a string
• partial_dict – a dictionary of decrypted letters• Returns the decrypted text, according to the partial dictionary. Letters that
were not decrypted, will be replaced by “-”
• We will use try_decrypt in to try and guess letters and words
19
Partial decryption
def try_decrypt(text, partial_dict):
''' decrypts using partial_dict .
Characters not in partial_dict produce "-"
'''
result = ""
for ch in text :
if ch in partial_dict :
result += partial_dict[ch]
elif ch != " ":
result += "-" #unknown char
else:
result += " " #space
return result
20
3. Repeatedly guess letters and words based on prior knowledge
LET’S BREAK SOME CIPHER!!
21