10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •once we have letter...

21
מבוא כללי לתכנות ולמדעי המחשב תרגול מס' 10 דין שמואל[email protected] 1

Upload: others

Post on 18-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

מבוא כללי לתכנות ולמדעי המחשב

10'תרגול מס

דין שמואל[email protected]

1

Page 2: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Substitution cipher

2

Page 3: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Substitution cipher

• Substitution cipher is a cipher where every plaintext character is replaced for a different ciphertext character

• It differs from the Caesar cipher in that the cipher alphabet is not simply the alphabet shifted, it is completely shuffled

• The mapping must be one-to-one

3

Page 4: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Create cipher

alphabet = "abcdefghijklmnopqrstuvwxyz"

import random

def create_cipher(alphabet):

shuffled = random.sample(alphabet,26) #create list from alphabet in random order

shuffled = "".join(shuffled) # convert list to str,NO NEED TO UNDERSTAND THIS

encrypt_dict = {} #an empty dictionaryfor i in range(len(original)):

encrypt_dict[original[i]] = shuffled[i]

return encrypt_dict

4

Demo online…..

Page 5: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

encrypt

5

• Once we have the cipher, we can easily encrypt

• For every character in text:• If in the alphabet:

• Replace with the value from the cipher

• Else:• Leave untouched.

Page 6: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

encrypt

6

• Once we have the cipher, we can easily encrypt

def encrypt(text, enc_dict):

""" encrypts text using enc_dict as substitution cipher

"""

cipher = ""

for char in text:

if char not in enc_dict:

cipher = cipher + char #characters not in

#enc_dict are untouched

else:

cipher = cipher + enc_dict[char]

return cipher

Page 7: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

decrypt

7

• Say we know the cipher, how to decrypt?

• Remember Caesar cipher:• To decrypt we simply re-encrypted from the other direction

• Same here:• Reverse the cipher dictionary, keys will be values and vice versa.

def reverse_dict(d1):

""" reverse keys <--> values in dictionary d1 """

d2 = {} #an empty dictionary

for key in d1:

value = d1[key]

d2[value] = key #insert value:key into d2

return d2

Page 8: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

decrypt

8

• Now to decrypt we just encrypt with the reversed dictionary

def decrypt(cipher, enc_dict):

""" decrypts cipher that was encrypted using enc_dict """

dec_dict = create_reverse_cipher(enc_dict)

return encrypt(cipher, dec_dict)

Page 9: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

decrypt

9

• What can we do if we don’t know the cipher?

• Can we guess it?

Page 10: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Guessing the cipher

• There are 26! Options• More than 10^26….

• But ‘English is English’

• Letters \ groups of letters \ words frequencies• https://en.wikipedia.org/wiki/Frequency_analysis

• "is", "the", "are", "of", "with", "to“, “a”, “was”

10

Page 11: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Guessing the cipher

• Letters \ groups of letters \ words frequencies• https://en.wikipedia.org/wiki/Frequency_analysis

• "is", "the", "are", "of", "with", "to“, “a”, “was”

11

Most common letters:e, t, o, a

Page 12: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Breaking the cipher

• Modus Operandi:

1. Count appearances of chars is the text

2. Guess the most frequent letters based on prior knowledge

3. Repeatedly guess letters and words based on prior knowledge• One letter words are usually: ‘i’ or ‘a’

• Common words in English: ‘is’, ‘the’, ‘are’, ‘of’, ‘with’, ‘to’, ‘was’

12

Page 13: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

1. Count appearances of chars is the text

13

def char_count(text):

''' counts number of appearances of characters in text.

Returns a sorted dictionary. '''

cnts = {}

for char in alphabet:

cnts[char] = 0

for char in text:

if char in alphabet:

cnts[char] = cnts[char] + 1

return cnts

def sort_by_count(cnts_dict):

''' sorts a given dictionary whose elements are char:count

sorting is done by counts, returns a list '''

return sorted(cnts_dict.items(), key = lambda x:-x[1])

No need to understand this

Page 14: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

1. Count appearances of chars is the text

• We will count letter frequencies in our cipher.

• We will count letter frequencies in a representing text.

• We will compare the frequencies to guess letters.

• What texts can represent the letter frequencies in English?

14

Page 15: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

2. Guess the most frequent letters based on prior knowledge

• What texts can represent the letter frequencies in English?

• Is “The quick brown fox jumps over the lazy dog” enough?

15

Page 16: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

2. Guess the most frequent letters based on prior knowledge

• What texts can represent the letter frequencies in English?

• Is “The quick brown fox jumps over the lazy dog” enough?

• Let’s go with something longer….

16

Page 17: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

2. Guess the most frequent letters based on prior knowledge• Tom Sawyer:

• http://www.gutenberg.org/files/74/74-0.txt

• Harry Potter and the Philosophers Stone:• https://archive.org/stream/Book5TheOrderOfThePhoenix/Book%201%20-

%20The%20Philosopher%27s%20Stone_djvu.txt

• The Hobbit:• https://archive.org/stream/TheHobbitByJ.R.RTolkien/The%20Hobbit%20by%2

0J.R.R%20Tolkien_djvu.txt

• ‘download’ function in the .py file. We will not cover it.

17

Page 18: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Exercise

• Once we have letter frequencies from a large text- can we simply guess the entire cipher according to it?

• The most frequent letter in the cipher will be the most frequent letter in the text and so on….

18

def create_dec_dic(what_dic, train_dic):

'''creates dictionary to decipher encrypted text'''

what = sort_by_count(what_dic)

train = sort_by_count(train_dic)

dec_dic={}

for i in range(len(train)):

dec_dic[what[i][0]]= train[i][0]

return dec_dic

Page 19: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Partial decryption

• That won’t work….

• Let’s write a function called try_decrypt• Arguments:

• text – a string

• partial_dict – a dictionary of decrypted letters• Returns the decrypted text, according to the partial dictionary. Letters that

were not decrypted, will be replaced by “-”

• We will use try_decrypt in to try and guess letters and words

19

Page 20: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

Partial decryption

def try_decrypt(text, partial_dict):

''' decrypts using partial_dict .

Characters not in partial_dict produce "-"

'''

result = ""

for ch in text :

if ch in partial_dict :

result += partial_dict[ch]

elif ch != " ":

result += "-" #unknown char

else:

result += " " #space

return result

20

Page 21: 10 סמ לוגרתcs4cyber.wdfiles.com/local--files/recitations/rec10.pdf · •Once we have letter frequencies from a large text- can we simply guess the entire cipher according

3. Repeatedly guess letters and words based on prior knowledge

LET’S BREAK SOME CIPHER!!

21