using a genetic algorithm for approximate string matching on genetic code carrie mantsch december 5,...

14
Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Post on 21-Dec-2015

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Using a Genetic Algorithm for Approximate String Matching on

Genetic Code

Carrie Mantsch

December 5, 2003

Page 2: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Outline

• Problem Statement

• Current Techniques

• GA Motivation

• My Algorithm

• Results

• Extension Possibilities

Page 3: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Problem Statement

The problem is to search and align strands of DNA using a genetic algorithm.

Page 4: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Current Techniques

• Approximate string matching– Usually meant for smaller strings– Many are set up for k mismatches

• 2 DNA strands of size 90 and 85– Allowing for 5 gaps in the second strand gives

almost 44 million possible alignments

Page 5: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Current Techniques (cont.)

• Needleman-Wunsch– Gap penalty -1

– Match bonus +1

– Mismatch 0

• Not practical if the sequence starts in the middle

– Counts the gaps at the beginning and end as penalties.

Page 6: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Current Techniques (cont.)

• BLAST (Basic Local Alignment Search Tool) and FASTA– Use domain specific knowledge

• http://www.ncbi.nlm.nih.gov/BLAST

• http://fasta.bioch.virginia.edu

Page 7: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

GA Motivation

• Alien DNA

• Junk DNA

• Extendable to similar text searches without domain specific knowledge

Page 8: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

My Algorithm

• The population– Bit strings of 0’s and 1’s– 0’s are spaces, 1’s mean a letter is placed there– The number of 1’s stays constant as the number

of letters in the smaller search string

Page 9: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

My Algorithm (cont.)

• Breeding– Rank based selection

• Crossover– The common place markers are kept the same– The rest of the place markers are split evenly

between the two children

Page 10: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

My Algorithm (cont.)

• Mutation– If the amount of gaps is less than one tenth of

the small string size add a gap– Otherwise delete a gap

Page 11: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Results

• The target match

Page 12: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Results (cont.)

• Ran for 50 generations

• Different random numbers for the same number of generations give best fitness values between about 32 and 67 (optimal fitness - 90)

Page 13: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Extension Possibilities

• Better representation of population

• Be able to alter fitness evaluation to be more specific to different problems

• Ability to add domain specific knowledge

• Parallel searching

Page 14: Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

Questions?