using a genetic algorithm for approximate string matching on genetic code carrie mantsch december 5,...
Post on 21-Dec-2015
224 views
TRANSCRIPT
Using a Genetic Algorithm for Approximate String Matching on
Genetic Code
Carrie Mantsch
December 5, 2003
Outline
• Problem Statement
• Current Techniques
• GA Motivation
• My Algorithm
• Results
• Extension Possibilities
Problem Statement
The problem is to search and align strands of DNA using a genetic algorithm.
Current Techniques
• Approximate string matching– Usually meant for smaller strings– Many are set up for k mismatches
• 2 DNA strands of size 90 and 85– Allowing for 5 gaps in the second strand gives
almost 44 million possible alignments
Current Techniques (cont.)
• Needleman-Wunsch– Gap penalty -1
– Match bonus +1
– Mismatch 0
• Not practical if the sequence starts in the middle
– Counts the gaps at the beginning and end as penalties.
Current Techniques (cont.)
• BLAST (Basic Local Alignment Search Tool) and FASTA– Use domain specific knowledge
• http://www.ncbi.nlm.nih.gov/BLAST
• http://fasta.bioch.virginia.edu
GA Motivation
• Alien DNA
• Junk DNA
• Extendable to similar text searches without domain specific knowledge
My Algorithm
• The population– Bit strings of 0’s and 1’s– 0’s are spaces, 1’s mean a letter is placed there– The number of 1’s stays constant as the number
of letters in the smaller search string
My Algorithm (cont.)
• Breeding– Rank based selection
• Crossover– The common place markers are kept the same– The rest of the place markers are split evenly
between the two children
My Algorithm (cont.)
• Mutation– If the amount of gaps is less than one tenth of
the small string size add a gap– Otherwise delete a gap
Results
• The target match
Results (cont.)
• Ran for 50 generations
• Different random numbers for the same number of generations give best fitness values between about 32 and 67 (optimal fitness - 90)
Extension Possibilities
• Better representation of population
• Be able to alter fitness evaluation to be more specific to different problems
• Ability to add domain specific knowledge
• Parallel searching
Questions?