in Silico Primer Design and Simulation for Targeted High
Throughput Sequencing
in Silico Primer Design and Simulation for Targeted High
Throughput Sequencing
I519 – FALL 2010Adam Thomas,Kanishka Jain,
Tulip Nandu
BACKGROUNDBACKGROUNDn Major Milestone
n Molecular structure of DNA
n Human Genome Project
n High-Throughput Sequencing (HTS)
n HTS transformed common experiments on single genes to entire genomes
n Low cost
n Multiple samples in every run (Eg. 454 Sequencer can sequence 400-600Mb)
BACKGROUNDBACKGROUNDn Primers are a short stand of nucleotides that
serve as the starting point of DNA synthesis.
n Approximately 20-25 nucleotides.
n Used to determine the DNA strand that needs amplification.
n Complement of DNA strand.
PCRPCRn Polymerase Chain Reaction
n Technique to amplify a small region of DNA
n 3 step process:
n Denaturation,
n Annealing and
n Extension.
n Process repeated for approximately 30 to 40 cycles.
PCRPCRn Denaturation
Heat (approx 90°C) separates double strand into two single strands
PCRPCRn Annealing
Primer binding to individual strands (occurs at 45 to 60°C)
PCRPCRn Extension
Temperature raised to 72°C and the Tag DNA polymerase enzyme is used to replicate DNA strands
PCRPCRn End of First Cycle
Process repeated for approximately 30 to 40 cycles.
CURRENT PROCESSCURRENT PROCESS
CURRENT PROCESSCURRENT PROCESSn Primer3 used to create primers using PCR.
n The primers then need to be validated. Validation is
performed by simulation, alignment and re-assembly.
n MetaSim is used to simulate PCR to create expected
amplicons.
n CAP3 is used for re-assembly of simulated sequences.
n BLASTing the simulated sequences against the original
sequence give a fairly accurate measure of how well the
primers will perform.
ISSUES FACED WITH CURRENT PROCESSISSUES FACED WITH CURRENT PROCESS
n Each tool uses different file inputs and outputs.
n Users have to manually convert file formats to use in each tool.
n None of the tools up till now can integrate all of the functions and give high throughput analysis.
GOALGOAL
Integrate the whole process involved in the High
throughput sequencing experiment and keep
track of the parameters that are enter or
changed.
OBJECTIVESOBJECTIVES
n A way to visualize the primers and amplicons in relation to
the genome and be able to edit the primers manually and
see how that affects the simulation.
n Optimization of the high-throughput process by minimizing
the number of reads needed by the ‘454 process’ and still
be able to assemble the sequence.
n Validation of the simulated amplicon reads to see whether
the predicted simulation is in order and rectify the problem.
PROPOSED SOLUTIONPROPOSED SOLUTION
VISUALIZATION TOOLVISUALIZATION TOOLn GBrowse
n Popular and open source.
n Well defined plugin architecture.
n Plugin to design primers using Primer3 already available.
PRIMER DESIGNPRIMER DESIGNn PrimerDesign.pm plugin already exists for GBrowse. Design
primers using Primer 3
n Designed to only amplify one specific region of DNA with as
few primers and no overlapping amplicons.
n Tweaked to take two additional input parameters: Amplicon
Overlap and Max Amplicon Length.
n Once primers are created using GBrowse, the primers are
output into a Featured File Format (FFF)
PRIMER VALIDATION - SIMULATION
PRIMER VALIDATION - SIMULATION
n Simulation performed using MetaSim.
n MetaSim:
n Generates sets of synthetic reads or mate-pairs based
on adaptable sequencing error models (e.g. for Sanger
chemistry, Roche's 454 and Illumina (former Solexa).
n Can be controlled via graphical user interface or in
command line mode.
SIMULATIONSIMULATIONn Function written in Perl to invoke MetaSim using
command line option.
n Algorithm:
n Read FFF file. Extract primer coordinates.
n Extract sequence from the original sequence.
n Run MetaSim simulation using command line
options.
n Each sequence generates its own FASTA
sequence file with multiple sequences.
ASSEMBLYASSEMBLYn Perl function written to invoke CAP3 using its
command line interface.
n Each file generated from the MetaSim
simulation is input into CAP3 which then
assembles the contigs.
ASSEMBLYASSEMBLYn CAP3.
n Input simulated sequences as FASTA file.
n CAP3 is a sequence assembly program that allows users to assemble a set of short contigs.
n Takes an input a file of sequence reads in FASTA format.
n If header contains a dot (‘.’), CAP3 requires that the names of reads sequenced from the same subclone contain the same substring up to the first dot.
n Can be invoked using a command line interface.
BLASTBLASTn Assembled contigs are then BLASTed against the original
sequence to validate.
n GBrowse accepts the assembled sequence and BLASTs against the original sequence.
n This plugin requires 4 steps:
n Exporting assembled contigs and original sequence from Gbrowse.
n Creating a BLAST database.
n BLASTing the contigs against the sequence.
n Importing result back into GBrowse.
DEMODEMO
QUESTIONSQUESTIONS