rab, kigali -rwanda may 02 –13,...

25
Introduction to Bioinformatics IMBB 2017 RAB, Kigali - Rwanda May 02 – 13, 2017 Joyce Nzioki

Upload: others

Post on 29-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

  • IntroductiontoBioinformatics

    IMBB2017RAB,Kigali- RwandaMay02– 13,2017

    JoyceNzioki

  • PlanfortheWeekIntroductiontoBioinformatics

    QualityControl

    Denovoassembly

    BLASTandBiologicaldatabases

    DNABarcoding

    NucleotidesequenceAnalysis

    MSAandPhylogenetics

    Sequencedepositing

    Resolvingconflicts

    Rawsangersequencedata IntroductiontoCLCBio

  • WhatisBioinformatics

    • Bioinformatics is an interdisciplinary sciencethat develops and improves on methods ofstoring, retrieving, organizing and analyzingbiological data.

    • This computational techniques are to solvebiological problems and discover the wealth ofbiological information hidden in biologicaldata.

  • BioinformaticsThedesign,construction anduse ofsoftwaretoolstogenerate,store,annotate andanalyse dataandinformationrelatingtoMolecularBiology.

  • BioinformaticsThedesign,construction anduse ofsoftwaretoolstogenerate,store,annotate andanalyse dataandinformationrelatingtoMolecularBiology.

    Hereweconsidertheuse ofbioinformaticstoolsratherthantheirdesignandconstruction.

    Hereweconsidertheaccess,storage andanalysisofdataandinformationitemsratherthanthegenerationandannotation.

  • Bioinformatics

    Experiment Analysis

    SequenceStructureFunctionEvolutionPathwayInteractionMutationexpression

    Hypothesis

    DATA RESULT

  • Genomes

    DNA & RNA sequence

    Protein sequence

    Protein families, motifs and domains

    Protein structure

    Protein interactions

    Chemical entities

    Pathways

    Systems

    Gene expression

    Literature and ontologies

    DNA & RNA structure

    Major types of Bioinformatics Data

  • BioinformaticsResearchareasInclude but not limited to

    • Organization, classification, dissemination and analysis ofbiological and biomedical data

    • Biological sequence analysis and phylogenetics.• Genome organization and evolution• Regulation of gene expression and epiginetics• Biological pathways and network in healthy and disease states• Protein structure prediction from sequence• Modelling and prediction of the biophysical properties ofbiomolecules for binding prediction and drug design

    • Design of biomolecular structure and functionWith applications to Biology, Medicine, Agriculture and Industry

  • Wheredidbioinformaticscomefrom?

    Bioinformaticsaroseasmolecularbiologybeguntobetransformedbytheemergenceofmolecularsequenceandstructuraldata

    • Recap:Thekeydogmasofmolecularbiology• DNAsequencedeterminedproteinsequence• Proteinsequencedeterminesproteinstructure• Proteinstructuredeterminesproteinfunction• Regulatorymechanisms(e.g.geneexpression)determinestheamountofaparticularfunctioninspaceandtime

    Bioinformaticsisnowessentialforthearchiving,organizationandanalysisofdatarelatedtotheseprocesses

  • Bioinformaticsinvolvestheapplicationofcomputeralgorithms,computermodelsandcomputerdatabaseswiththebroadgoalofunderstandingtheactionofgenes,transcripts,proteinsandlargecollectionsinthisentities

    TheintegrationofinformationlearnedaboutthisthreebiologicalprocessesgivesinsightIntothebiologyoforganisms

  • Howdoesitlooklikeonacomputer

  • A cDNA sequence (reading frame)

    >gi|14456711|ref|NM_000558.3| Homo sapiens hemoglobin, alpha 1 (HBA1), mRNA

    ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTCAAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCCTGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGGCCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTGTCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACTGCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAAGTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACC

    GTTAAGCTGGAGCCTCGGTGGCCATGCTTCTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAAAGTCTGAGTGGGCGGC

    A protein sequence

    >gi|4504347|ref|NP_000549.1| alpha 1 globin [Homo sapiens]

    MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR

  • HowdoweactuallydoBioinformatics?

    PrepackagetoolsanddatabasesvManyonlineandopensourcevSomearecommercial

    TooldevelopmentvMostlyonUNIXenvironmentvKnowledgeofprogrammingrequires(Python,Perl,R,C,Java)vMayrequirespecializedorhighperformancecomputingresources

  • HistoryofBioinformatics

  • HistoryofBioinformatics

  • Sequencing

    DNAsequencingisaprocessofdeterminingtheorderofnucleotideswithinaDNAmolecule.

  • HistoryofDNAsequencing

    • 1976:Maxam – Gilbertsequencing• 1977:Sangersequencing(dideoxy chaintermination)

    • 1986:Flourescently labelled ddNTPs• 1987:AppliedBiosystems (ABI370)• 1988:Capillarygell electrophoresis• 1999:AppliedBiosystems ABI3700DNAAnalyzer• 2005>:Nextgenerationsequencing

  • NextGenerationSequencing

    14CTLGH Introduction to Bioinformatics, 13-17 Feb 2017, Nairobi Bert OverduinIntro to NGS Sequencing Technologies

    Illumina HiSeqIllumina NextSeqIllumina MiSeqIllumina MiniSeq Illumina NovaSeq

    Ion ProtonIon PGM Ion S5

    PacBio RS II PacBio Sequel ONT MinION ONT PromethION ONT SmidgION

  • ApplicationsofBioinformatics• Microbialgenome

    applications• Molecularmedicine• Personalizedmedicine• Preventivemedicine• Genetherapy• Drugdevelopment• Antibioticresistance• Evolutionarystudies• Biotechnology• Climatechangestudies• Cropimprovement

    • Forensicanalysis• Insectresistance• Improvenutritionalquality• Developmentofdrought

    resistantvarieties• Veterinaryscience• Bioengineering• Agriculturebiotechnology.

  • LimitationsofBioinformatics• Bioinformaticsisascienceofinferencehence:

    • Qualityofbioinformaticspredictionsdependsonthequalityofdataandsophisticationofalgorithms.

    • Sequencedatamayhaveerrorswhichsubsequentlyleadstoerrorsindownstreamanalysis.

    • Manyexhaustivealgorithmscannotbeusedduetocomputationallimitations.

    • Trade-offbetweenspecificityandsensitivity

  • Whybioinformaticsthen•Inmostcasesbiologics/wetlabisneededtovalidatebioinformaticpredictions

    •Bioinformaticscan:–Reducedatatoasmallsetoftestablepredictions–Assignadegreeofconfidencetoeachprediction

    •Thebiologistwilloftenhavetochoosetheappropriatedegreeofconfidence,dependingon:

    –Costofvalidatingpredictions.–Benefitexpectedfromtherightpredictions.

    •Datamining- theprocessbywhichtestablehypothesisaregeneratedregardingthefunctionorstructureofageneorproteinofinterestbyidentifyinghomologsinbettercharacterizedorganisms.

    •Bioinformaticsasinsillico biology:–Allowsforexplorationofdomainsthatcannotbeaddressedmanuallye.gstudyofpastevolutionaryevents/patterns.

  • The EndAcknowledgingBertOverduin UniversityorEdinburgh

    andEBIonlinecoursesforsomeslides

  • Thankyou

    [email protected]

    IMBB2017RAB,Kigali- RwandaMay02– 13,2017