presentation bio encryption

22
Submitted By : Priyanka Nema 08EAIIT036 IT Submitted To: Er. Pradeeep Sharma Er. Chandresh Bakliwal A SEMINAR PRESENTATION ON “ BIOENCRYPTION ”

Upload: priyanka-nema

Post on 22-Oct-2014

122 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Presentation Bio Encryption

Submitted By :Priyanka Nema08EAIIT036IT

Submitted To:Er. Pradeeep SharmaEr. Chandresh Bakliwal

A SEMINAR PRESENTATION

ON

“ BIOENCRYPTION ”

Page 2: Presentation Bio Encryption

INTRODUCTION

•Bioencryption , as the name refers , means living , refers to generate something in coded form”.

•Bio encryption is a new method of data storage that converts information into DNA sequences allows you to store the contents of an entire computer hard- drive on a gram's worth of Escherichia coli. bacteria...and perhaps considerably more than that.

•The encryption technique is transforming the original plaintext message into the ciphertext message to ensure its security and integrity in network traffic.The result of this process is known as encrypted information.

• Decryption, on other hand, is the re-transformation of the cipher text file at the receiver end into the original plaintext file to extract the transferred message .

•And the practice and study of transforming messages to make them secure and immune to attacks is called as cryptography.

“Bio” “Encryption”

Page 3: Presentation Bio Encryption

•The idea of storing data inside bacteria has been around for about a decade. Even very simple bacteria have long strands of DNA with tons of bases available for data encryption, and bacteria are by their nature far more resilient to damage than more traditional electronic storage.

.

Bacteria are nature's hardiest survivors, capable of surviving just about any disaster that would finish off a regular hard drive. Besides, bacteria's natural reproduction would create lots of redundant copies of the data, which would help preserve the integrity of the information and make retrieval easier.

(Escherichia coli bacteria)

Page 4: Presentation Bio Encryption

• An encoding system takes the original data, turns it into a quaternary number, and then encodes it as a DNA sequence. Encryption is achieved through DNA sequence shuffling. That process also involves compressing the data to allow for more storage within the same sequence. Finally, a class of DNA is examined, which secretly tag the input DNA and hide it within collections of DNA. This technology provides high security to the stored information.

•Recombinant DNA techniques have been developed for a wide class of operations on DNA and RNA strands.

•There has recently arisen a new area of research known as DNA computing, which makes use of recombinant DNA techniques for doing computation.

• DNA and RNA are appealing media for data storage due to the very large amounts of data that can be stored in compact volume.

•They vastly exceed the storage capacities of conventional electronic, magnetic, optical media. A gram of DNA contains about 1021 DNA bases, or about 108 terabytes.

Page 5: Presentation Bio Encryption

DNA AND BASIC NUCLEOBASES

•DNA & Basics  Nucleobases (or nucleotide bases/nitrogenous bases)are the parts of DNA and RNA that maybe involved in pairing.

• The primary nucleobases are cytosine, guanine , adenine (DNA and RNA), thymine(DNA) and uracil (RNA), abbreviated as C, G, A, T, and U.

•The "skeleton" of adenine and guanineis Purine , hence the name Purine-Bases . The "skeleton" of cytosine , Uracil, and Thymine is Pyrimidine , hencePyrimidine - Bases

Page 6: Presentation Bio Encryption

PRINCIPLE

•Site-specific recombination systems are classified into two distinctive groups –

- Integration-excision, - Inversion systems.

•In shufflon system, Rci-mediated recombination occurs between any repeat of the DNA segments either independently or in groups of DNA sequences.

•Rci-dependent deletion of segment flanked by the natural repeat sequences was not occurred, i.e., the DNA sequences between repeat after recombination is conserved and no loss of DNA sequence occurs.

• For the repeat sequence, there are mainly four groups, repeat a, b, c, d. There are seven different repeat sequences in nature.

•There are repeat 1-7. Repeat 1, 2 belong to repeat a, repeat 4, 6, 7 belong to repeat b, repeat 5 belongs to repeat c, and repeat 3 belongs to repeat d.

Page 7: Presentation Bio Encryption

•Experiments showing that the inversion frequency with DNA sequences flanked by two repeat a is the best, and it is much higher than that with any two combination of repeat a, b, c, d flanking the DNA sequence.

There are 12 bp sequences before every 19 bp(base pairs) repeat sequence. For the Rci recombinase, it is shown that the inversion caused by the wide type (WT) is greater than that with modified, or point mutation at some positions of rci gene.

Page 8: Presentation Bio Encryption
Page 9: Presentation Bio Encryption

ENCRYPTION TECHNIQUES :

STEP 1: TRANSLATION

•A translation table would first need to be constructed by the client; the extended ASCII table with 256 characters is used as standard. It is not difficult to identify DNA as a naturally referred as a quaternary numeral system,

• With the DNA base adenosine representing the number “0”, thymine representing “1”, cytosine representing “2” and guanine representing “3”, we are essentially encoding the 256 characters with this base-4 numeral system.

adenosine 0

thymine 1

cytosine 2

guanine 3

Page 10: Presentation Bio Encryption

EXAMPLE OF iGEM(translation)

•In a presentation on their breakthrough, the Hong Kong researchers showed how to change the word "iGEM" into DNA-ready code.

• They used the ASCII table to convert each of the individual letters into a numerical value (i=105, G=71,E=69,M=77 etc.), which can then be changed from base-10 to base-4 (105=1221, 71=0113, etc.).

• Finally, those numbers can be changed into their DNA base equivalents, with 0, 1, 2, and 3 replaced with A, T, C, and G. And so iGEM becomes ATCTATTGATTTATGT.

•Once the raw data is ready, the researchers say a few algorithms can be used to weed out redundant and repetitive information. That doesn't just save a ton of space - lots of repetition in the DNA sequence can actually be biologically harmful to the wellbeing of the DNA and bacteria, so this step rather neatly solves two problems at once.

Page 11: Presentation Bio Encryption
Page 12: Presentation Bio Encryption

STEP 2: COMPRESSION

•Before subjecting the DNA sequences to synthesis, a compression step is subsequent to the translation process.

• Deflate – renowned as a lossless data compression algorithm that uses a combination of Huffman coding and LZ77 algorithm, this compression process is beneficial in two aspects:

-firstly, more information could be included when comparing to the uncompressed message of the same length and ,

- secondly, repetitive regions could be reduced significantly.

• This is fundamentally crucial to the infrastructure of the DNA storage system as repetitive regions in DNA sequences are devastating to both DNA synthesis and sequencing, with the compression algorithms these cases would be minimal.

Page 13: Presentation Bio Encryption

STEP 3: MASSIVELY PARALLEL STORAGE SYSTEM

•Incorporating a complete usage of this parallel storage system that one can systematically use it for data/information storage.

• In order to store a large piece of information such as a photograph or a dictionary, it is impossible to include it within a single piece of DNA as this is limited by the current DNA synthesis technology.

•One approach is to fragment the information into pieces and insert them into the cells. However simply fragmenting the information followed by insertion to the cell would destroy all the data, as the order of these fragments is unknown.

• To overcome this obstacle each sequence of segment to be inserted into the bacterial cell composes of three sectors – Headers, Messages and Checksum.

• Header is the address of that particular message fragment. The message is self-explanatory – the message fragment itself and the checksum is an identification and correction system for minor mutations.

Page 14: Presentation Bio Encryption

DECRYPTION

Decryption is not simple, it consists of three-tier security fencing – encoding system, encryption system and checksum system, and the message could only be retrieved when enough information is provided. Here shows the design of a single data fragment: 

Recombination Sequence:

Page 15: Presentation Bio Encryption

The full message can be restored from data fragments through a series of steps :

STEP 1:

NEXT GENERATION HIGH-THROUGHPUT SEQUENCING (NGS) AND ASSEMBLING

•With the information-encrypted bacteria provided, the plasmid DNA would be extracted and subjected to next-generation high-throughput sequencing (NGS).

•A reason to choose high-throughput sequencing instead of ordinary sequencing technology would be NGS is a massively parallel sequencing process, which means there must exist multiple copies of sequencing products (reads) that could cover a particular message stored within the DNA, these multiple copies of reads could enable us to perform a majority voting on bases for which qualities are not the best.

• Moreover with the current reads assembling algorithms available – Velvet and Euler for example, assembling the reads from NGS is no longer a formidable task.

Page 16: Presentation Bio Encryption

STEP 2:

IDENTIFICATION OF REPEAT SEQUENCES, MESSAGES AND CHECKSUM

•The second tier, with the given encryption system – like R64 shufflon system in this case, the repeats are known.

•The repeats could be recovered by using alignment tools such as BLAST and the sequences in between the repeats would be regarded as the fragment of messages, with unknown order however.

• The checksum is right behind the last repeat sequence.

Page 17: Presentation Bio Encryption

FINAL STEP:

COMBINATORIAL PROBLEM

•The third tier, only the client would know the function to derive the checksum. With the checksum formula, we are just one step before reaching our goal – recovering the correct message.

•With different fragments of messages provided, they are concatenated in different permutations; fit the trial into the checksum formula, compare with that on the sequence and BINGO if they are the same, or if not one would have to try again.

Page 18: Presentation Bio Encryption

RESEARCH ON BIOENCRYPTION

•Researchers at The Chinese University of Hong Kong (CUHK) managed to get the 8,074 character Declaration of Independence stored in 18 cells of bacteria.

• The 90GB claim comes from the fact that 1 gram of cells consists of 10 million cells showing you the potential for huge storage capabilities in hardly any space or weight.

•Testing is ongoing, but the team has already proven they can convert data and store it as DNA and then get the data back out without any loss of information. They also believe any data can be stored using this method including text, images, music, and video.

•Students at Hong Kong’s Chinese University may be onto a type of memory media that could be a truly secure way to store data — text, images, music, and video.

•It takes up almost no space, can be encrypted, and is so gross that it’s unlikely many people would attempt to steal it.

•That is, if the thief would even consider searching  a refrigerator for massive data storage inside E. Coli– the bacteria responsible for 90% of urinary tract infections, which can cause food poisoning and is the reason for many food recalls.

Page 19: Presentation Bio Encryption

•The bacteria can successfully and securely be used for biostorage, the storage of data in living things.

•According to AFP report, the U.S. national archives take up more than 500 miles of shelves, but one gram of bacteria used for storing data could hold the same amount of information as 450 hard drives with 2,000 gigabytes (2 TB) each of storage capacity.

ADVANTAGES

•Any data can be stored using this method including text, images, music, and video.

•Bacteria being more resilient to temperature and other climatic changes, means we could have storage devices of all different shapes and sizes.

•This parallel storage system can store a huge amount of information that means we do not have to worry about our increasing need of storage capacity.

•The stored data is more resistant; it cannot be hacked and altered by end users.

Page 20: Presentation Bio Encryption

CONCLUSION

•Recent research has considered DNA for large-scale computation and information storage.

•Key application of DNA-based encryption is data and cryptography.

•Practical applications of cryptographic systems are limited in electronic media by the size of the storage media.

• DNA provides a much more compact storage medium and an extremely small amount of DNA can be used for large amount of data storage

•This is a vast technology and it has many future applications. It will be a widely used storage technology in the future time.

•This technology is going to be future’s storage media.

•Researchers of The Chinese University of Hong Kong (CUHK) had said “we need to look for a living solution, more specifically bacteria. In fact, they’ve already achieved it, managing to store 90GB of data in 1 gram of cells”.

Page 21: Presentation Bio Encryption

THANKYOU

Page 22: Presentation Bio Encryption

QUERIES

????