classification of cipher using machine learning techniques
DESCRIPTION
This project is submitted in partial fulfillment of final year dissertation in the year 2015 by Om Prakash at Jaypee Institute of Information Technology, Noida. The project is developed in Accord .NET framework.Contact: Om [email protected]TRANSCRIPT
Literature Survey
Classification of Ciphers
Submitted By: Om Prakash
Enrollment No: 13303015
Supervisor Name: Dr. Satish Chandra
Introduction
In Cryptography, a cipher is an algorithm for performing encryption or decryption. The
earliest form of ciphers deals with hiding the readability of the plaintext is known as
Classical Ciphers. Ciphertexts produced by these ciphers always reveal statistical
information about the plaintext. Further as the computational power of the computers
enhanced, nearly all such ciphers became more or less readily breakable. Thus in the mid
19070s new ciphers came into existence is heavily based on mathematical theories and
computer science practices to achieve computational hardness is classified as Modern
Ciphers.
As the computational hardness of the ciphers grown breaking it became a new challenging
area known as Cryptanalysis.
Attack Scenarios:
Ciphertext-only attack: This is the most basic type of attack and refers to the
scenario where the adversary just observes a ciphertext and attempts to determine the
plaintext that was encrypted.
Known-plaintext attack: Here, the adversary learns one or more pairs of
plaintexts/ciphertexts encrypted under the same key. The aim of the adversary is then
to determine the plaintext that was encrypted to give some other ciphertext (for which
it does not know the corresponding plaintext).
Chosen-plaintext attack: In this attack, the adversary has the ability to obtain the
encryption of any plaintext(s) of its choice. It then attempts to determine the plaintext
that was encrypted to give some other ciphertext.
Chosen-ciphertext attack: The final type of attack is one where the adversary is
even given the capability to obtain the decryption of any ciphertext(s) of its choice.
The adversary's aim, once again, is then to determine the plaintext that was encrypted
to give some other ciphertext (whose decryption the adversary is unable to obtain
directly).
A ciphertext-only attack is the easiest to carry out in practice; the only thing the adversary
needs is to eavesdrop on the public communication line over which encrypted messages are
sent. Considering this scenario the first and foremost thing that adversary needs to do is to
find the cipher used to encrypt the message. This creates the foundation for research and
development in area of classification of ciphers.
Contemporary challenging R & D problems in classification of ciphers
Very view work has been done and published in the public domain till now in this area.
Classifying classical ciphers using frequency analysis is a trivial task yet classifying modern
ciphers is quite a lot difficult and even the best solution has the success rate below 50%. Thus
it is very challenging research and development problem to classify the modern ciphers from
a group and even from the universe. My research focus is to classify the modern and classical
cipher from a group which is mostly used.
Paper 1:
Summary: In this paper Classical Substitution Cipher namely, Playfair, Vigenère and Hill
ciphers are considered, is classified using neural network based identification. The features of
the cipher methods under consideration are extracted and a back propagation neural network
is trained. The network is tested for random texts with random keys of various lengths. The
cipher text size is fixed as 1Kb. The results so obtained were encouraging.
Title of paper Classification of Substitution Ciphers using Neural Networks
Authors G.Sivagurunathan, V.Rajendran, and Dr.T.Purusothaman
Year of Publication 2010
Web link http://paper.ijcsns.org/07_book/201003/20100340.pdf
Publishing Details Sivagurunathan, G., Rajendran, V., & Purusothaman, T. (2010).
Classification of Substitution Ciphers using Neural Networks.
IJCSNS, 10(3), 274.
Paper 2:
Title of paper Classification of Ciphers
Authors Pooja Maheshwari
Year of Publication 2001
Web link http://www.security.iitk.ac.in/pages/projects/cryptanalysis/reposito
ry/pooja.ps
Publishing Details Maheshwari, P. (2001). Classification of ciphers (Doctoral
dissertation, Indian Institute of Technology, Kanpur).
Summary: This paper deals with classifying the Classical Ciphers namely Substitution
Cipher,Permutation Cipher, Combination of Substitution and Permutation Cipher,
andVigenere Cipher and Modern Ciphers namely DES and IDEA. In case of these classical
Ciphers the main attack is frequencydistribution whereas for classifying DES and IDEA
severalapproaches like randomness tests, use of XORoperations, use of threshold functions is
used and some encouraging results are found.
Paper 3:
Summary: In this paper the author has Classified Blowfish, RC4 and Camellia using Support
Vector Machine and a goodness threshold is achieved using which trivially good test vector
share obtained which further modified to get better result.
At the beginning a set of test vectors is generated by solving the following linear program.
Maximize Objective function f = ci
Subjected to a set of constraints of the form
ci bi > 𝑇
320
𝑖=1
and set of RC4 constraints
ci bi ≤ T
320
𝑖=1
Where the possible values of C1 is Blowfish or Camellia.
Title of paper Classification of Ciphers Using Machine Learning
Authors Gaurav Saxena
Year of Publication 2008
Web link http://www.security.iitk.ac.in/contents/publications/more/ciphers_m
achine_learning.pdf
Publishing Details Saxena, G. (2008). Classification of ciphers using machine learning.
Master's thesis, Department of Computer Science and Engineering,
Indian Institute of Technology. Kanpur.
The values of training and testing errors seem to indicate that it is easier to classify good test
vectors from bad test vectors if lower values of goodness threshold are considered.
Paper 4:
Summary: In this paper the following block cipher algorithms, DES, IDEA,AES, and RC
operating in ECB mode are considered .Eight different classification techniques which are:
Naïve Bayesian(NB), Support Vector Machine (SVM), neural network (MPL),Instance based
learning (IBL), Bagging (Ba), AdaBoostM1,Rotaion Forest (RoFo), Decision Tree are used
to identify the cipher text. This study aims to find the best classification algorithm to identify
the cipher encryption method. The performance of each of the classifiers is presented, and the
simulation results show that, in general, the RoFo classifier has the highest classification
accuracy.
Title of paper Classifying Encryption Algorithms Using Pattern Recognition
Techniques
Authors Suhaila O. Sharif, L.I. Kuncheva and S.P. Mansoor
Year of Publication 2010
Web link http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5689769
&isnumber=5688739
Publishing Details Sharif, S.O.; Kuncheva, L.I.; Mansoor, S.P., "Classifying encryption
algorithms using pattern recognition techniques," Information
Theory and Information Security (ICITIS), 2010 IEEE International
Conference on , vol., no., pp.1168,1172, 17-19 Dec. 2010
Paper 5:
Summary: In this paper, author has proposed an approach for identification of encryption
method for block ciphers using support vector machines. Five block ciphers namely DES
(CBC), 3DES, Blowfish, AES and RC5 is identified and result accuracy is obtained.
Title of paper Identification of Block Ciphers using Support Vector Machines
Authors Dileep A. D. and C. Chandra Sekhar
Year of Publication 2006
Web link http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1716462
&isnumber=36115
Publishing Details Dileep, A.D.; Sekhar, C.C., "Identification of Block Ciphers using
Support Vector Machines," Neural Networks, 2006. IJCNN '06.
International Joint Conference on , vol., no., pp.2696,2701, 0-0 0
Paper 6:
Summary: Ciphers encrypted with the same key are called ciphers in depth. A depth attack is
a form of cryptanalysis that takes advantage of finding ciphers in depth and could break a
cryptosystem without even knowing the encryption algorithm. The first task in a depth attack
is to cluster ciphers according to their common keys and is called depth detection. Then one
may want to know the file type of the underlying message of each cipher. In this paper depth
Title of paper Classifying File Type of Stream Ciphers in Depth Using Neural
Networks
Authors James George Dunham, Ming-Tan Sun and Judy C. R. Tseng
Year of Publication 2005
Web link http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1387088
&isnumber=30191
Publishing Details Dunham, J.G.; Ming-Tan Sun; Tseng, J.C.R., "Classifying file type
of stream ciphers in depth using neural networks," Computer
Systems and Applications, 2005. The 3rd ACS/IEEE International
Conference on , vol., no., pp.97,, 2005
detection is accomplished for stream ciphers with a hit rate of 99.5%. Ciphers in depth are
further classified according to the file types of their underlying messages with an accuracy of
over 90%. One important goal of this research is not to use the structure and key words of
any specific file types as this allows the result to be applied to general file types. Also, the
features extracted from the test samples for classification are simple ones, leaving room for
improving the performance by adopting more complicated features.
Paper 7:
Summary: Genetic algorithms (GAs) are a class of optimization algorithms. In this paper
authors have proposed genetic algorithm to decipher mono alphabetic substitution cipher
using frequency analysis to obtain objective function.
The following is an outline of proposed algorithm:
Title of paper Using Genetic Algorithm to Break a Mono - Alphabetic Substitution
Cipher
Authors S. S. Omran, A. S. Al-Khalid and D. M. Al-Saady
Year of Publication 2010
Web link http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5720065
&isnumber=5719958
Publishing Details Omran, S.S.; Al-Khalid, A.S.; Al-Saady, D.M., "Using Genetic
Algorithm to break a mono - alphabetic substitution cipher," Open
Systems (ICOS), 2010 IEEE Conference on , vol., no., pp.63,67, 5-7
Dec. 2010
Results Obtained:
Classification of Classical Cipher
Motivation: Relative English Letter Frequency Analysis of entries in the Concise Oxford
dictionary by some trusted compilers shows very interesting result.
Unigram relative frequency of letters in English language
Bigram frequency in the English language
th 1.52 en 0.55 ng 0.18
he 1.28 ed 0.53 of 0.16
in 0.94 to 0.52 al 0.09
er 0.94 it 0.50 de 0.09
an 0.82 ou 0.50 se 0.08
re 0.68 ea 0.47 le 0.08
nd 0.63 hi 0.46 sa 0.06
at 0.59 is 0.46 si 0.05
on 0.57 or 0.43 ar 0.04
nt 0.56 ti 0.34 ve 0.04
ha 0.56 as 0.33 ra 0.04
es 0.56 te 0.27 ld 0.02
st 0.55 et 0.19 ur 0.02
16 most common character-level trigrams in English language
1. the
2. and
3. tha
4. ent
5. ing
6. ion
7. tio
8. for
9. nde
10. has
11. nce
12. edt
13. tis
14. oft
15. sth
16. men
Proposed Algorithm:
Cost1 = |𝐾𝑈 𝑖 − 𝐷𝑈(𝑖)|25𝑖=0 // Cost for unigram frequency of alphabets in order
Cost2 = |𝐾𝑈𝑆 𝑖 − 𝐷𝑈𝑆(𝑖)|25𝑖=0 // Cost of sorted unigram frequency of alphabets
Cost3 = |𝐾𝐵𝑆 𝑖 − 𝐷𝐵𝑆(𝑖)|25𝑖=0 // Cost of sorted bigram frequency of alphabets
Cost4 = |𝐾𝑇𝑆 𝑖 − 𝐷𝑇𝑆(𝑖)|25𝑖=0 // Cost of sorted trigram frequency of alphabets
If(cost1 ≤ Tval1)
return P Cipher
if( cost2 ≤ Tval2)
if(cost3 ≤ Tval3 && cost4 ≤ Tval4)
return S Cipher
else
return PS Cipher
else
return Unclassified
Implementation:
Appendix
A. Gantt Chart
Survey of Challenging problems in Cryptography
Defining Problem Statement
Literature Survey on Classification of Ciphers
Impl. of Selected Ciphers
Impl. of Classical Cipher Classifier
Choosing the Modern Ciphers to Classify
Implementing Modern Ciphers Classifier
Analysing Correctness
Improving the accuracy
Start Date
Duration
B. Details of practice with new tool/technology
I am using Windows Form Application to implement Classical Cipher Classifier and
MATLAB to implement Modern Cipher Classifier.
Windows Form Application is a development environment with .Net framework supporting
various languages at the back end like c++, c#, VB, F#. The application can run on Windows
or even on Linux in Wine.
MATLAB is a high-level language and interactive environment for numerical computation,
visualization, and programming. By using MATLAB, one can analyze data, develop
algorithms, and create models and applications. The language, tools, and built-in math
functions enables us to explore multiple approaches and reach a solution faster than with
spreadsheets or traditional programming languages, such as C/C++ or Java. One can use
MATLAB for a range of applications, including signal processing and communications,
image and video processing, control systems, test and measurement, computational finance,
and computational biology. MATLAB is widely used today in industry and academia as the
language of technical computing.
I will be using MATLAB initially for result evaluation, when I will be able to get good
results then I will convert my MATLAB code in other programming language like Java or
OpenCV.
C. References
[1].Sivagurunathan, G., Rajendran, V., & Purusothaman, T. (2010). Classification of
Substitution Ciphers using Neural Networks. IJCSNS, 10(3), 274.
[2].Sharif, S.O.; Kuncheva, L.I.; Mansoor, S.P., "Classifying encryption algorithms using
pattern recognition techniques," Information Theory and Information Security (ICITIS), 2010
IEEE International Conference on , vol., no., pp.1168,1172, 17-19 Dec. 2010
[3]. Dileep, A.D.; Sekhar, C.C., "Identification of Block Ciphers using Support Vector
Machines," Neural Networks, 2006. IJCNN '06. International Joint Conference on , vol., no.,
pp.2696,2701, 0-0 0
[4]. Dunham, J.G.; Ming-Tan Sun; Tseng, J.C.R., "Classifying file type of stream ciphers in
depth using neural networks," Computer Systems and Applications, 2005. The 3rd
ACS/IEEE International Conference on , vol., no., pp.97,, 2005
[5]. Khadivi, P.; Momtazpour, M., "Application of data mining in cryptanalysis,"
Communications and Information Technology, 2009. ISCIT 2009. 9th International
Symposium on , vol., no., pp.358,363, 28-30 Sept. 2009
[6]. Khadivi, P.; Momtazpour, M., "Cipher-text classification with data mining," Advanced
Networks and Telecommunication Systems (ANTS), 2010 IEEE 4th International
Symposium on , vol., no., pp.64,66, 16-18 Dec. 2010
[7]. Omran, S.S.; Al-Khalid, A.S.; Al-Saady, D.M., "Using Genetic Algorithm to break a
mono - alphabetic substitution cipher," Open Systems (ICOS), 2010 IEEE Conference on ,
vol., no., pp.63,67, 5-7 Dec. 2010
[8]. Omran, S.S.; Al-Khalid, A.S.; Al-Saady, D.M., "A cryptanalytic attack on Vigenère
cipher using genetic algorithm," Open Systems (ICOS), 2011 IEEE Conference on , vol., no.,
pp.59,64, 25-28 Sept. 2011
[9]. De Canniere, C.; Biryukov, Alex; Preneel, B., "An introduction to Block Cipher
Cryptanalysis," Proceedings of the IEEE , vol.94, no.2, pp.346,356, Feb. 2006
[10]. Toemeh, R., & Arumugam, S. (2008). Applying Genetic Algorithms for Searching Key-
Space of Polyalphabetic Substitution Ciphers. Int. Arab J. Inf. Technol., 5(1), 87-91.
[11]. Dureha, A., & Kaur, A. (2013). A Generic Genetic Algorithm to Automate an Attack on
Classical Ciphers. International Journal of Computer Applications, 64(12), 20-25.
[12]. Mishra, S., & Bali, S. (2013). Public key cryptography using genetic algorithm.
International J Recent Technol. Eng.(IJRTE), 2(2), 150-154.
[14]. Toemeh, R., & Arumugam, S. (2008). Applying Genetic Algorithms for Searching Key-
Space of Polyalphabetic Substitution Ciphers. Int. Arab J. Inf. Technol., 5(1), 87-91.
[15]. Maheshwari, P. (2001). Classification of ciphers (Doctoral dissertation, Indian Institute
of Technology, Kanpur).
[16]. Saxena, G. (2008). Classification of ciphers using machine learning. Master's thesis,
Department of Computer Science and Engineering, Indian Institute of Technology. Kanpur.
[17]. Nagireddy, S. (2008). A Pattern Recognition Approach To Block.
[18]. Rao, M. B. (2003). Classification of RSA and IDEA Ciphers (Doctoral dissertation,
Indian Institute of Technology, Kanpur).
[19]. http://en.wikipedia.org/wiki/Letter_frequency
[20]. http://en.wikipedia.org/wiki/Bigram
[21]. http://en.wikipedia.org/wiki/Trigram