Download - CSCE555 Bioinformatics
![Page 1: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/1.jpg)
CSCE555 BioinformaticsCSCE555 Bioinformatics
Lecture 3 Gene FindingMeeting: MW 4:00PM-5:15PM SWGN2A21Instructor: Dr. Jianjun HuCourse page: http://www.scigen.org/csce555
University of South CarolinaDepartment of Computer Science and Engineering2008 www.cse.sc.edu.
![Page 2: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/2.jpg)
RoadmapRoadmap
Transcription and Translation
Structure and Organization of Genes
Gene Finding in genomes of Prokaryotic
organisms
Introduction to Sequence Alignment
Summary
04/21/23 2
![Page 3: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/3.jpg)
How to Do Great How to Do Great Bioinformatics?Bioinformatics?You need to understand biologyYou need to understand the
NEEDS of biologistsYou know how to identify the key
problems in biology that become addressable today
![Page 4: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/4.jpg)
Transcription & TranslationTranscription & Translation
Prokaryotic Cells Eukaryotic Cells
![Page 5: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/5.jpg)
Transcription Process: RNA Transcription Process: RNA PolymerasePolymerase
![Page 6: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/6.jpg)
Translation: How Ribosome Translation: How Ribosome Synthesizes ProteinsSynthesizes Proteins
Ribosomes manufacture proteins based on mRNA instructions. Each ribosome reads mRNA, recruits tRNA molecules to fetch amino acids, and assembles the amino acids in the proper order.
Genetic Code
![Page 7: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/7.jpg)
Genetic CodeGenetic Code
![Page 8: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/8.jpg)
Gene Structure of Gene Structure of Prokaryotic CellsProkaryotic Cells
TAATGATAG
![Page 9: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/9.jpg)
Genes in Eukaryotic CellsGenes in Eukaryotic Cells
![Page 10: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/10.jpg)
Pre-mRNA Splicing Pre-mRNA Splicing ProcessProcess
![Page 11: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/11.jpg)
11M Alternative SplicingM Alternative Splicing
Gene Info:1) A DNA sequence coding for the pre-mRNA2) An additional DNA code or other regulating process, which regulates the alternative splicing.
![Page 12: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/12.jpg)
Core Promoter Core Promoter StructureStructure
![Page 13: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/13.jpg)
RoadmapRoadmap
Transcription and Translation
Structure and Organization of Genes
Gene Finding in genomes of Prokaryotic
organisms
Introduction to Sequence Alignment
Summary
04/21/23 13
![Page 14: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/14.jpg)
How to Find GenesHow to Find Genes
TAATGATAG
ATG
![Page 15: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/15.jpg)
Gene-Finding Algorithm Gene-Finding Algorithm Input: DNA sequences, a threshold
gene length KOutput: All possible ORF sequencesProcedure:Scan each of 3 ORFs, and find
subsequence that start with ATG and end with one of (TAA, TAG, TGA)
Repeat above for the complementary sequences also
![Page 16: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/16.jpg)
Risk of the Simple Gene Risk of the Simple Gene Finding AlgorithmFinding AlgorithmThe identified ORFs may arise
just from randomness.How likely is it for an ORF to be a
result of random sequences?Significance of an ORF to be
Gene:◦We expect the likelihood of ORF
being result of random sequences to be less than p.
![Page 17: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/17.jpg)
Calculating pCalculating p3 out of 64 are stopping condonsP( run of k non-stop
condons)=(61/64)^k(61/64)^62=0.051
Setting k=64 (62+1 ATG+ 1 StopCondon)
will make sure the identified ORFs are less likely to be out of random permutation.
![Page 18: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/18.jpg)
Permutation Permutation Test/Randomization TestTest/Randomization TestA generic method to estimate
significance level (p value)Example: how likely that a 10-condon
ORF is result of random permutation?Method:
◦Randomly generate (or permute given sequences) 10,000 sequences
◦Draw a histogram of seq lengths of sequences that have a stop-condon (Null distribution)
◦Calculate the percentage of random ORFs that have lengths >=10.
![Page 19: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/19.jpg)
Estimating cut-off K for gene Estimating cut-off K for gene finding algorithmfinding algorithmExact theoretical calculation:
sensitive to the assumptions, equal probability of condons, etc
Randomized test: do a permutation test, find a length k such that <5% of random ORFs have lengths greater than k.
![Page 20: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/20.jpg)
Sequence Alignment: the Sequence Alignment: the ProblemProblemGiven two sequences, measure
their similarityATAACTTTAATTAAATCCTTTTACTAAA
![Page 21: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/21.jpg)
Web Tool to Align Two Web Tool to Align Two SequencesSequenceshttp://www.ebi.ac.uk/emboss/alig
n
![Page 22: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/22.jpg)
Applications of Sequence Applications of Sequence AlignmentAlignmentPrediction of functions of
(gene/protein/promoters) homology
Database search◦Find similar sequences that are similar
to our query sequence (e.g. new gene)Gene finding by genome
comparisonSequence divergence/phylogeny Sequence Assembly
![Page 23: CSCE555 Bioinformatics](https://reader035.vdocuments.us/reader035/viewer/2022062309/568145b7550346895db2ba80/html5/thumbnails/23.jpg)
SummarySummaryTranscription, TranslationGene structures of Prokaryotic
and Eukaryotic cellsFinding genes (ORFs) for
prokaryotic cellsSequence alignment applications