application of compute science in tics

Upload: isley-sokudo-ryoku-vtania

Post on 07-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 Application of Compute Science in tics

    1/41

    CIS427: Introduction to

    BioinformaticsThe application of computer science in

    Bioinformatics

    Dr V. C. Osamor

    Department of Computer & Information Science(Bioinformatics unit)Covenant University

    [email protected]

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    2/41

    Objectives

    To expose the students to possible areasof computer techniques applicable inBioinformatics.

    To expose students to how prediction isimportant in solving Biology problem.

    To expose students to how machine

    learning is applicable in BioinformaticsTo expose students to how Expert Systemis applicable in Bioinformatics.

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    3/41

    The prediction Problem

    Make any sentence from English language

    We know that letters do not occur in English at random(e.g. e is more common than z)

    Predicting symbols is fundamental to a wide range ofimportant applications (e.g. encryption, compression)

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    4/41

    Prediction in bioinformatics

    This will largely require development of your own

    computational tools or the use of existing tools inareas such as:

    Predicting the location of genes in DNA

    Predicting gene roles in an organismPredicting errors in a genetic transcription

    Predicting the function of proteins

    Predicting diseases from molecular samplesAnything that involves making a judgment; a

    yes/no decision about whether some sampledatum does or does not have some property.

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    5/41

    Representation

    DATA - DATA

    0101011101100101011001010111010000101101

    to the computer, everything is binary!

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    6/41

    0101011101100101011001010111010000101101

    0101101100100111111011010011010000101101

    A A C G T C A T T C G A T G A T T C G A

    Just as we can teach a computer to predictthings about a sequence of letters in Englishprose, we can also teach it to predict thingsabout other sequenceslike a genetic

    sequence

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    7/41

    A genetic prediction problem

    ttgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgcta

    atggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcata

    cgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagc

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    8/41

    A genetic prediction problem

    ttgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagctgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacag

    cgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcat

    acgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgcaatcggcgctacgcttcaaaatttattatattcccggcgcggctacgttcatcccagcagcagcgattttaaaattaacgcatcagactctcgtcgcgttcgtcgcctttattcacgctaatggacgacatcttttactacgacggcgcctacgcatcgcagcatacgacgcccagcatagtattttagaggcgaggacatcatcatatcgcagctacagcgcatcagacgcatacgacgacgactacgacgacactaacgacgatgttgcgcacccacaccagttatatagagacgaactcgcatcagtgttgcgcacccacaccagttatatagagacgaactc

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    9/41

    A genetic prediction problem

    A gene encodes a protein

    It is a blueprint that provides biochemicalinstructions on how to construct a sequence ofamino acids so as to make a working proteinthat will perform some function in the organism

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    10/41

    A genetic prediction problem

    encoding regionuntranslated region

    transcription

    factor RNARNA

    RNA

    RDr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    11/41

  • 8/4/2019 Application of Compute Science in tics

    12/41

    A genetic prediction problem

    untranslated regionttgcaatcggcgctacgcttcaaaatttattatattcccggc

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    13/41

    A genetic prediction problem

    ttgcaatcggcgctacgcttcaaaatttattatattcccggc

    What transcription factors bind to this gene?

    Where is the transcription factor binding site?

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    14/41

    A genetic prediction problem

    ttgcaatcggcgctacgcttcaaaatttattatattcccggc

    Clues: A binding site is often a short general pattern

    E.g. CCGATNATCGG

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    15/41

    A genetic prediction problem

    ttgcaatcggcgctacgcttcaaaatttattatattcccggc

    Clues: The patterns are often reverse complements

    E.g. CCGATNATCGGGGCTANTAGCC

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    16/41

    A genetic prediction problem

    ttgcaatcggcgctacgcttcaaaatttattatattcccggc

    Clues: Where there is one binding site, often there is

    another nearby.

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    17/41

    A genetic prediction problem-ALGORITHMS and DATA

    STRUCTURES

    All of these properties are the kinds of things forwhich computer science has developedalgorithms and data structures to identify quicklyand efficiently, and therefore it is exactly the

    kind of problem computer scientists should beable to solve.

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    18/41

    Proteomics

    Three consecutive nucleotides in the coding regionform a codon i.e. encode an amino acid.

    A string of amino acids makes a protein.

    3 nucleotides, 4 possibilities each: ACTG

    43

    = 64 possible codons

    But there are only 20 amino acids!

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    19/41

    proteomics

    Glycine: GGA, GGC, GGG, GGTTyrosine: TAT, TACMethionine: ATG

    There is quite a bit of redundancy in codons.

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    20/41

    Amidegroup

    Carboxylgroup

    R group

    Amino Acid

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    21/41

    Amino Acid

    glycine

    tyrosine

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    22/41

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    23/41

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    24/41

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    25/41

    Artificial Intelligence

    Computers do thingsonly human brainscan otherwise do

    expertsystem

    expert

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    26/41

    Artificial Intelligence & MachineLearning

    Computers do thingsonly human brainscan otherwise do

    learningsystem

    expertsystem

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    27/41

    Machine learning

    creating computer programs that get better with experiencelearn how to make expert judgmentsdiscover previously hidden, potentially useful information (data

    mining)

    What is machine learning?

    How does it work?

    user provides learning system with examples of concept to belearned

    induction algorithm infers a characteristic model of the examplesmodel is used to predict whether or not future novel instances arealso examples and it does this very consistently, and very, veryquickly!

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    28/41

    Statistical and Mathematicaltechniques

    Statistical and mathematical techniques,ranging from exact, heuristics, fixedparameter and approximation algorithms

    for problems based on parsimony models

    To Markov Chain Monte Carlo algorithms

    for Bayesian analysis of problems basedon probabilistic models.

    Dr V.C. Osamor CIS427

    http://en.wikipedia.org/wiki/Heuristicshttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Bayesian_analysishttp://en.wikipedia.org/wiki/Markov_Chain_Monte_Carlohttp://en.wikipedia.org/wiki/Approximation_algorithmshttp://en.wikipedia.org/wiki/Heuristics
  • 8/4/2019 Application of Compute Science in tics

    29/41

    OTHER BIOINFORMATICSAPPLICATIONS

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    30/41

    Bioinformatics Applications

    Bioinformatics was applied in the creation andmaintenance of a database to store biologicalinformation at the beginning of the "genomicrevolution", such as nucleotide and amino acidsequences.

    Development of this type of database involvednot only design issues but the development ofcomplex interfaces whereby researchers couldboth access existing data as well as submit newor revised data.

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    31/41

    Biotechnology

    Biologists know proteins, computerscientists know machine learning

    Together, they can find out a lot of hiddeninformation about genes and proteins

    Biotechnology is a multi-billion dollarindustry

    Biotechnology is one of the best fundedareas of scientific research

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    32/41

    Modelling a Biological system

    There are two fundamental ways of modelling aBiological system (e.g. living cell) both comingunder Bioinformatic approaches.

    Static Sequences - Proteins, Nucleic acids and Peptides

    Structures - Proteins, Nucleic acids, Ligands(including metabolites and drugs) and Peptides

    Interaction data among the above entities includingmicroarray data and Networks of proteins,metabolites

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    33/41

    Modelling Biological System

    Dynamic

    Systems Biology comes under this categoryincluding reaction fluxes and variable

    concentrations of metabolites Multi-Agent Based modelling approaches

    capturing cellular events such as signalling,transcription and reaction dynamics

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    34/41

    Major Research Efforts

    Major research efforts in the field includesequence alignment, gene finding, genomeassembly,

    drug design, drug discovery, protein structurealignment, protein structure prediction,

    prediction of gene expression and protein-

    protein interactions,genome-wide association studies

    and the modeling of evolution.

    Dr V.C. Osamor CIS427

    a or researc areas

    http://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformatics
  • 8/4/2019 Application of Compute Science in tics

    35/41

    a or researc areas 1 Sequence analysis

    2 Genome annotation

    3 Computational evolutionary biology

    4 Analysis of gene expression

    5 Analysis of regulation

    6 Analysis of protein expression 7 Analysis of mutations in cancer

    8 Comparative genomics

    9 Modeling biological systems 10 High-throughput image analysis 11 Structural Bioinformatic Approaches

    - Prediction of protein structure, Molecular Interaction ,Docking algorithms

    Dr V.C. Osamor CIS427

    S A l i

    http://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Bioinformatics
  • 8/4/2019 Application of Compute Science in tics

    36/41

    Sequence AnalysisThis sequence information is analyzed to

    determine genes that encode polypeptides(proteins), RNA genes, regulatory sequences,structural motifs, and repetitive sequences.

    A comparison of genes within a species orbetween different species can show similaritiesbetween species (the use of molecularsystematics to construct phylogenetic trees).

    With the growing amount of data, it becameimpractical to analyze DNA sequencesmanually. Today, computer programs such as

    BLAST are used. Dr V.C. Osamor CIS427

    Genome Annotation/Gene finding

    http://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/BLASThttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Phylogenetic_treehttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Molecular_systematicshttp://en.wikipedia.org/wiki/Species
  • 8/4/2019 Application of Compute Science in tics

    37/41

    Genome Annotation/Gene findingAnnotation is the process of marking the genesand other biological features in a DNA

    sequence. The first genome annotationsoftware system was designed in 1995 by Dr.Owen White, who was part of the team at The

    Institute for Genomic Research that sequencedand analyzed a free-living organism thebacterium Haemophilus influenzae.

    He built a software system to find the genes(places in the DNA sequence that encode aprotein), the transfer RNA, and other features,and to make initial assignments of function tothose enes. Dr V.C. Osamor CIS427

    http://en.wikipedia.org/wiki/Genome_projecthttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/Haemophilus_influenzaehttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/The_Institute_for_Genomic_Researchhttp://en.wikipedia.org/wiki/Genome_project
  • 8/4/2019 Application of Compute Science in tics

    38/41

    Computational Evolutionary Bio

    Evolutionary biology is the study of theorigin and descent of species, as well astheir change over time. Informatics hasassisted evolutionary biologists in severalkey ways; it has enabled researchers to:

    Trace the evolution

    Compare GenomesComputational prediction model ofpopulation over time

    Dr V.C. Osamor CIS427

    http://en.wikipedia.org/wiki/Evolutionary_biologyhttp://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Informatics_%28academic_field%29http://en.wikipedia.org/wiki/Informatics_%28academic_field%29http://en.wikipedia.org/wiki/Specieshttp://en.wikipedia.org/wiki/Evolutionary_biology
  • 8/4/2019 Application of Compute Science in tics

    39/41

    Comparative Genomics

    The core of comparative genome analysisis the establishment of thecorrespondence between genes (orthology

    analysis) or other genomic features indifferent organisms.

    Dr V.C. Osamor CIS427

  • 8/4/2019 Application of Compute Science in tics

    40/41

  • 8/4/2019 Application of Compute Science in tics

    41/41

    High throughput Imaging

    Image analysis systems augment anobserver's ability to make measurementsfrom complex set of images, by improving

    accuracy, objectivity, or speed. Biomedicalimaging is becoming more important forboth diagnostics and research.

    http://en.wikipedia.org/wiki/Accuracyhttp://en.wikipedia.org/wiki/Objectivity_%28science%29http://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Diagnosticshttp://en.wikipedia.org/wiki/Objectivity_%28science%29http://en.wikipedia.org/wiki/Accuracy