into to bioinfo

Upload: anshul

Post on 29-May-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 Into to Bioinfo

    1/53

    How Bioinformatics can change your lifeHow Bioinformatics can change your life

    Basic Concepts of BioinformaticsBasic Concepts of Bioinformatics

  • 8/8/2019 Into to Bioinfo

    2/53

    22

    IntroductionIntroduction

  • 8/8/2019 Into to Bioinfo

    3/53

    33

    20002000

    A Major event happened that was toA Major event happened that was tochange the course of human historychange the course of human history

    It was a joint British and AmericanIt was a joint British and American

    efforteffort It was a raceIt was a race who will completewho will complete

    firstfirst

    Race TestRace Test not whether they havenot whether they have

    taken drugs but whether they cantaken drugs but whether they canproduce them!produce them!

    Human genome was sequencedHuman genome was sequenced

  • 8/8/2019 Into to Bioinfo

    4/53

    44

    Bioinformatics is:Bioinformatics is:

    driven by the generation of data,driven by the generation of data,moderated by hardware andmoderated by hardware andanalysis methodsanalysis methods

    Computing power

    Data generationplatforms

    Analysis methods

  • 8/8/2019 Into to Bioinfo

    5/53

    55

    What isWhat is

    The merging between computerThe merging between computerscience and molecular biologyscience and molecular biology The algorithm and techniques ofThe algorithm and techniques of

    computer science are being used tocomputer science are being used to

    solve the problems faced by molecularsolve the problems faced by molecularbiologistsbiologists

    Information technology applied toInformation technology applied tothe management and analysis ofthe management and analysis ofbiological databiological data Storage and Analysis are two of theStorage and Analysis are two of the

    important functionsimportant functions bioinformaticiansbioinformaticiansbuild tools for eachbuild tools for each

  • 8/8/2019 Into to Bioinfo

    6/53

    66

    Biology Chemistry

    Statistics

    ComputerScience

    Bioinformatics

  • 8/8/2019 Into to Bioinfo

    7/53

    77

    What is..What is..

    This is the age of the InformationThis is the age of the InformationTechnologyTechnology

    However storing info is nothing newHowever storing info is nothing new

    Information to the volume ofInformation to the volume ofBritannica Encyclopedia is stored inBritannica Encyclopedia is stored ineach of our cellseach of our cells

    Bioinformatics tries to determineBioinformatics tries to determinewhat info is biologically importantwhat info is biologically important

  • 8/8/2019 Into to Bioinfo

    8/53

    88

    BasicsBasics

    ofofMolecular Biology.Molecular Biology.

  • 8/8/2019 Into to Bioinfo

    9/53

    99

    DNA & GenesDNA & Genes

    DNA is where the genetic information isDNA is where the genetic information isstoredstored

    Blonde hair and blue eyes are inherited byBlonde hair and blue eyes are inherited bythisthis

    GeneGene -- The basic unit of heredityThe basic unit of heredity There are genes for characteristics i.e. a geneThere are genes for characteristics i.e. a gene

    for blond hair etcfor blond hair etc

    Genes contain the information as aGenes contain the information as asequence of nucleotidessequence of nucleotides

    Genes are abstract conceptsGenes are abstract concepts likelikelongitude and latitudes in the sense thatlongitude and latitudes in the sense thatyou cannot see them separatelyyou cannot see them separately

    Genes are made up of nucleotidesGenes are made up of nucleotides

  • 8/8/2019 Into to Bioinfo

    10/53

    1010

    Nucleotide (nt)Nucleotide (nt)

    Each nt I made up ofEach nt I made up of SugarSugar Phospate groupPhospate group BaseBase

    The base it (nt) contains makes the onlyThe base it (nt) contains makes the onlydifference between one nt and the otherdifference between one nt and the other There are 4 different basesThere are 4 different bases

    G(uanine),A(denine),T(hymine),C(ytosine)G(uanine),A(denine),T(hymine),C(ytosine)

    The information is in the order of nucleotideThe information is in the order of nucleotide

    and the order is the infoand the order is the info Genes can be many thousands of nt longGenes can be many thousands of nt long The complete set of genetic instructions isThe complete set of genetic instructions is

    called genomescalled genomes

  • 8/8/2019 Into to Bioinfo

    11/53

    1111

    ProteinsProteins

    Proteins are very importantProteins are very importantbiological featurebiological feature

    Amino Acids make up the proteinsAmino Acids make up the proteins

    20 different amino acids are there20 different amino acids are there The function of a protein isThe function of a protein is

    dependant on the order of the aminodependant on the order of the aminoacidsacids

  • 8/8/2019 Into to Bioinfo

    12/53

    1212

    ProteinsProteins

    The information required to make aa isThe information required to make aa isstored in DNAstored in DNA

    DNA sequence determines amino acidDNA sequence determines amino acidsequencesequence

    Amino Acid sequence determines proteinAmino Acid sequence determines proteinstructurestructure Protein structure determines proteinProtein structure determines protein

    functionfunction A Substance called RNA is used to carryA Substance called RNA is used to carry

    theInfo stored in the

    DNAthat in turn isthe

    Info stored in the

    DNAthat in turn isused to make proteinsused to make proteins

    StorageStorage -- DNADNA Information TransferInformation Transfer RNARNA RNA is the message boy!RNA is the message boy!

  • 8/8/2019 Into to Bioinfo

    13/53

    1313

    Central dogmaCentral dogma

    DNADNA transcriptiontranscription RNARNA TranslationTranslation ProteinProtein

    RNA PolymeraseRNA Polymerase RibosomesRibosomes

  • 8/8/2019 Into to Bioinfo

    14/53

    1414

  • 8/8/2019 Into to Bioinfo

    15/53

    1515

    Proteins..Proteins..

    Since there are 20 amino acids toSince there are 20 amino acids totranslate one nt cannot correspondtranslate one nt cannot correspondto one aa, neither can it correspondto one aa, neither can it correspondas twosas twos

    So in triplet codesSo in triplet codes codoncodon proteinproteininformation is carriedinformation is carried

    The codons that do not correspondThe codons that do not correspond

    to a protein are stop codonsto a protein are stop codons UAA,UAA,UAG, UGAUAG, UGA(RNA has U instead of T)(RNA has U instead of T)

    Some codons are used as startSome codons are used as startcodonscodons -- AUG as well as to codeAUG as well as to code

    methioninemethionine

  • 8/8/2019 Into to Bioinfo

    16/53

    1616

    Protein StructureProtein Structure

    Shows a wide variety as opposed to theShows a wide variety as opposed to theDNA whose structure is uniformDNA whose structure is uniform XX--ray crystallography orNuclear Magneticray crystallography orNuclear Magnetic

    Resonance (NMR) is used to figure out theResonance (NMR) is used to figure out thestructurestructure

    Structure is related to the function or ratherStructure is related to the function or ratherstructure determines the functionstructure determines the function Although proteins are created as a linearAlthough proteins are created as a linear

    structure of aa chain they fold into 3 dstructure of aa chain they fold into 3 dstructure.structure.

    If you stretch them and leave them they willIf you stretch them and leave them they willgo back to this structurego back to this structure this is thethis is the nativenativestructurestructure of a proteinof a protein

    Only in the native structure the proteinsOnly in the native structure the proteinsfunctions wellfunctions well

    Even after the translation is over proteinEven after the translation is over protein

    goes through some changes to its structuregoes through some changes to its structure

  • 8/8/2019 Into to Bioinfo

    17/53

    1717

    BioinformaticsBioinformaticsTechniques..Techniques..

  • 8/8/2019 Into to Bioinfo

    18/53

    1818

    Prediction and PatternPrediction and PatternRecognitionRecognition

    The two main areas of bioinformaticsThe two main areas of bioinformaticsareare

    Pattern recognitionPattern recognition

    A particular sequence or structure hasA particular sequence or structure hasbeen seen before and that a particularbeen seen before and that a particularcharacteristic can be associated with itcharacteristic can be associated with it

    PredictionPrediction

    From a sequence (what we know) weFrom a sequence (what we know) wecan predict the structure and functioncan predict the structure and function(what we dont know)(what we dont know)

  • 8/8/2019 Into to Bioinfo

    19/53

    1919

    Dot plots.Dot plots.

    Simple way of evaluatingSimple way of evaluatingsimilarity between twosimilarity between twosequencessequences

    In a graph one sequence is onIn a graph one sequence is onone side the next on the otherone side the next on the othersideside

    Where there are matchesWhere there are matchesbetween the two sequences thebetween the two sequences thegraph is markedgraph is marked

  • 8/8/2019 Into to Bioinfo

    20/53

    2020

  • 8/8/2019 Into to Bioinfo

    21/53

    2121

    AlignmentsAlignments

    A match for similarity between the characters of two orA match for similarity between the characters of two ormore sequencesmore sequences

    Eg.Eg. TTACTATATTACTATA TAGATATAGATA

    There are so many ways to align the above twoThere are so many ways to align the above twosequencessequences 1.1.

    TTACTATATTACTATA TAGATATAGATA

    2.2. TTACTATATTACTATA TAGATATAGATA

    3.3. TTACTATATTACTATA TAGATATAGATA

    So which one do we choose and on what basis?So which one do we choose and on what basis? Solution is to Provide a match score and mismatch scoreSolution is to Provide a match score and mismatch score

  • 8/8/2019 Into to Bioinfo

    22/53

    2222

    Dynamic ProgrammingDynamic Programming

    As the length of the query sequencesAs the length of the query sequencesincrease and the difference of lengthincrease and the difference of lengthbetween the two sequence also increasesbetween the two sequence also increasesmore gaps has to be inserted in variousmore gaps has to be inserted in variousplacesplaces

    We cannot perform an exhaustive searchWe cannot perform an exhaustive search

    Combinatorial explosion occursCombinatorial explosion occurs too muchtoo muchcombinations to search forcombinations to search for

    Dynamic programming is a way of usingDynamic programming is a way of usingheuristics to search in the most promisingheuristics to search in the most promisingpathpath

  • 8/8/2019 Into to Bioinfo

    23/53

    2323

    DatabasesDatabases

    Sequence info is stored inSequence info is stored indatabasesdatabases

    So that they can be manipulatedSo that they can be manipulated

    easilyeasily The db (next slide) are locatedThe db (next slide) are located

    at diff placesat diff places

    They exchange info on a dailyThey exchange info on a dailybasis so that they are upbasis so that they are up--toto--datedateand are in syncand are in sync

    Primary dbPrimary db sequence datasequence data

  • 8/8/2019 Into to Bioinfo

    24/53

    2424

    Nucleic acid (DNA/RNA)Nucleic acid (DNA/RNA)sequence databasessequence databases One main database arising from a partnershipOne main database arising from a partnership

    between GenBANK at the NCBI (National Center forbetween GenBANK at the NCBI (National Center forBiotechnology InformationBiotechnology Information USA), the EMBL dataUSA), the EMBL datalibrary at the EBI (European Bioinformatics Institutelibrary at the EBI (European Bioinformatics Institute

    UK) and the DNAData Bank at the NIG (NationalUK) and the DNAData Bank at the NIG (NationalInstitute of GeneticsInstitute of Genetics Japan).Japan).

    Daily exchanges between the 3 partners to keep theDaily exchanges between the 3 partners to keep thedatabases synchronised.databases synchronised.

    DNA and RNA sequences: curated, archived,DNA and RNA sequences: curated, archived,distributed.distributed.

    Sequences from genome projects, scientific articles,Sequences from genome projects, scientific articles,patent applications. Most scientific journals requirepatent applications. Most scientific journals requireDNA and RNA sequences related to eachDNA and RNA sequences related to eachpublication to be publicly available.publication to be publicly available.

    Sequences deposited early and going through aSequences deposited early and going through areview cycle; unannotated.. preliminary..review cycle; unannotated.. preliminary..unreviewed.. standard.unreviewed.. standard.

    Format: human and computer readable.Format: human and computer readable.

  • 8/8/2019 Into to Bioinfo

    25/53

    2525

  • 8/8/2019 Into to Bioinfo

    26/53

    Major Primary DBMajor Primary DBNucleic AcidNucleic Acid ProteinProtein

    EMBL (Europe)EMBL (Europe) PIRPIR --

    Protein InformationProtein InformationResourceResource

    GenBank (USA)GenBank (USA) MIPS,NCBIMIPS,NCBIDDBJ (Japan)DDBJ (Japan) SWISSSWISS--PROTPROT

    University of Geneva,University of Geneva,now with EBInow with EBI

    NCBINCBI TrEMBLTrEMBL

    A supplement to SWISSA supplement to SWISS--PROTPROT

    NRLNRL--3D3D

  • 8/8/2019 Into to Bioinfo

    27/53

    2727

    Composite DBComposite DB

    As there are many db which one toAs there are many db which one tosearch? Some are good in somesearch? Some are good in someaspects and weak in others?aspects and weak in others?

    Composite db is the answerComposite db is the answer whichwhichhas several db for its base datahas several db for its base data

    Search on these db is indexed andSearch on these db is indexed andstreamlined so that the same storedstreamlined so that the same storedsequence is not searched twice insequence is not searched twice indifferent dbdifferent db

  • 8/8/2019 Into to Bioinfo

    28/53

    2828

    Composite DBComposite DB

    OWL has these as their primaryOWL has these as their primarydbdb

    SWISS PROT (top priority)SWISS PROT (top priority)

    PIRPIR

    GenBankGenBank

    NRLNRL--3D3D

  • 8/8/2019 Into to Bioinfo

    29/53

    2929

    Secondary dbSecondary db

    Store secondary structure infoStore secondary structure infoor results of searches of theor results of searches of theprimary dbprimary db

    CompoCompo

    DBDB

    PrimaryPrimary

    SourceSource

    PROSITEPROSITE SWISSSWISS--PROTPROT

    PRINTSPRINTS OWLOWL

  • 8/8/2019 Into to Bioinfo

    30/53

    3030

    Structural databasesStructural databases

    The main database of protein structures is the PDBThe main database of protein structures is the PDB(Protein Data Bank).(Protein Data Bank).

    The PDB started in 1971 at Brookhaven NationalThe PDB started in 1971 at Brookhaven NationalLabs (NY, USA) and is now a distributedLabs (NY, USA) and is now a distributed

    organisation (Research Collaboratory for Structuralorganisation (Research Collaboratory for StructuralBioinformatics, www.rcsb.org) of US partnersBioinformatics, www.rcsb.org) of US partners(Rutgers, NJ; San Diego Supercomputer Centre,(Rutgers, NJ; San Diego Supercomputer Centre,Ca; NIST, Md).Ca; NIST, Md).

    The PDB includes protein structures (and a fewThe PDB includes protein structures (and a fewDNA and other structures) determined by XDNA and other structures) determined by X--rayraycrystallography and Nuclear Magnetic Resonance.crystallography and Nuclear Magnetic Resonance.

  • 8/8/2019 Into to Bioinfo

    31/53

    3131

    Database SearchesDatabase Searches

    We have sequenced and identifiedWe have sequenced and identifiedgenes. So we know what they dogenes. So we know what they do

    The sequences are stored inThe sequences are stored indatabasesdatabases

    So if we find a new gene in theSo if we find a new gene in thehuman genome we compare it withhuman genome we compare it withthe already found genes which arethe already found genes which arestored in the databases.stored in the databases.

    Since there are large number ofSince there are large number ofdatabases we cannot do sequencedatabases we cannot do sequencealignment for each and everyalignment for each and everysequencesequence

    So heuristics must be used again.So heuristics must be used again.

  • 8/8/2019 Into to Bioinfo

    32/53

    3232

    Areas inAreas in

    BioinformaticsBioinformatics

  • 8/8/2019 Into to Bioinfo

    33/53

    3333

    GenomicsGenomics

    Because of the multicellular structure, eachBecause of the multicellular structure, eachcell type does gene expression in acell type does gene expression in adifferent waydifferent way although each cell has thealthough each cell has thesame content as far as the geneticsame content as far as the genetic

    i.e.A

    ll the information for a liver cell to be ai.e.A

    ll the information for a liver cell to be aliver cell is also present on nose cell, soliver cell is also present on nose cell, sogene expression is the only thing thatgene expression is the only thing thatdifferentiatesdifferentiates

  • 8/8/2019 Into to Bioinfo

    34/53

    3434

    GenomicsGenomics -- Finding GenesFinding Genes

    Gene in sequence dataGene in sequence data needle in aneedle in ahaystackhaystack

    However as the needle is differentHowever as the needle is differentfrom the haystack genes are not difffrom the haystack genes are not difffrom the rest of the sequence datafrom the rest of the sequence data

    Is whole array of nt we try to find andIs whole array of nt we try to find andborder mark a set of nt as a geneborder mark a set of nt as a gene

    This is one of the challenges ofThis is one of the challenges of

    bioinformaticsbioinformatics Neural networks and dynamicNeural networks and dynamic

    programming are being employedprogramming are being employed

  • 8/8/2019 Into to Bioinfo

    35/53

    OrganismOrganism GenomeGenome

    SizeSize

    (Mb)(Mb)bp * 1,000,000bp * 1,000,000

    GeneGene

    NumberNumber

    Web SiteWeb Site

    YeastYeast 13.513.5 6,2416,241 http://genomehttp://genome--www.stanford.edwww.stanford.ed

    u/Saccharomyceu/Saccharomycess

    Fruit FliesFruit Flies 180180 13,60113,601 http://flybase.bio.http://flybase.bio.indiana.eduindiana.edu

    HomoHomoSapiensSapiens

    3,0003,000 45,00045,000 http://www.ncbi.nhttp://www.ncbi.nlm.nih.gov/genolm.nih.gov/genome/guideme/guide

  • 8/8/2019 Into to Bioinfo

    36/53

    3636

    ProteomicsProteomics

    Proteome is the sum total of anProteome is the sum total of anorganisms proteinsorganisms proteins

    More difficult than genomicsMore difficult than genomics

    44 2020 Simple chemical makeupSimple chemical makeup complexcomplex

    Can duplicateCan duplicate cantcant

    We are entering into the postWe are entering into the post

    genome eragenome era Meaning much has been done withMeaning much has been done with

    the Genesthe Genes not that its a overnot that its a over

  • 8/8/2019 Into to Bioinfo

    37/53

    3737

    Proteomics..Proteomics..

    The relationship between the RNA and the protein it codes areThe relationship between the RNA and the protein it codes areusually very differentusually very different

    After translation proteins do changeAfter translation proteins do change So aa sequence do not tell anything about the postSo aa sequence do not tell anything about the post

    translation changestranslation changes Proteins are not active until they are combined into a largerProteins are not active until they are combined into a larger

    complex or moved to a relevant location inside or outside the cellcomplex or moved to a relevant location inside or outside the cell So aa only hint in these thingsSo aa only hint in these things Also proteins must be handled more carefully in labs as they tendAlso proteins must be handled more carefully in labs as they tend

    to change when in touch with an inappropriate materialto change when in touch with an inappropriate material

  • 8/8/2019 Into to Bioinfo

    38/53

    3838

    Protein Structure PredictionProtein Structure Prediction

    Is one of the biggest challengesIs one of the biggest challengesof bioinformatics and esp.of bioinformatics and esp.biochemistrybiochemistry

    No algorithm is there now toNo algorithm is there now toconsistently predict the structureconsistently predict the structureof proteinsof proteins

  • 8/8/2019 Into to Bioinfo

    39/53

    3939

    Structure Prediction methodsStructure Prediction methods

    Comparative ModelingComparative Modeling

    Target proteins structure isTarget proteins structure iscompared with related proteinscompared with related proteins

    Proteins with similar sequencesProteins with similar sequencesare searched for structuresare searched for structures

  • 8/8/2019 Into to Bioinfo

    40/53

    4040

    PhylogeneticsPhylogenetics

    The taxonomical system reflectsThe taxonomical system reflectsevolutionary relationshipsevolutionary relationships

    Phylogenetics trees are things which reflectPhylogenetics trees are things which reflectthe evolutionary relationship thru athe evolutionary relationship thru a

    picture/graphpicture/graph Rooted trees where there is only oneRooted trees where there is only one

    ancestorancestor

    Un rooted trees just showing theUn rooted trees just showing therelationshiprelationship

    Phylogenetic tree reconstruction algorithmsPhylogenetic tree reconstruction algorithmsare also an area of researchare also an area of research

  • 8/8/2019 Into to Bioinfo

    41/53

    4141

    Applications.Applications.

  • 8/8/2019 Into to Bioinfo

    42/53

    4242

    Medical ImplicationsMedical Implications

    PharmacogenomicsPharmacogenomics

    Not all drugs work on all patients, some goodNot all drugs work on all patients, some gooddrugs cause death in some patientsdrugs cause death in some patients

    So by doing a gene analysis before theSo by doing a gene analysis before thetreatment the offensive drugs can be avoidedtreatment the offensive drugs can be avoided

    Also drugs which cause death to most can beAlso drugs which cause death to most can beused on a minority to whose genes that drug isused on a minority to whose genes that drug iswell suitedwell suited volunteers wanted!volunteers wanted!

    Customized treatmentCustomized treatment

    Gene TherapyGene Therapy

    Replace or supply the defective or missing geneReplace or supply the defective or missing gene E.g: Insulin and Factor VIII or HaemophiliaE.g: Insulin and Factor VIII or Haemophilia

    BioWeapons (??)BioWeapons (??)

  • 8/8/2019 Into to Bioinfo

    43/53

    4343

    Diagnosis ofDiseaseDiagnosis ofDisease

    Diagnosis of diseaseDiagnosis of disease Identification of genes which cause theIdentification of genes which cause the

    disease will help detect disease at earlydisease will help detect disease at earlystage e.g. Huntington diseasestage e.g. Huntington disease --

    SymptomsSymptoms uncontrollable dance likeuncontrollable dance like

    movements, mental disturbance,movements, mental disturbance,personality changes and intellectualpersonality changes and intellectualimpairmentimpairment

    Death in 10Death in 10--15 years15 years The gene responsible for the disease hasThe gene responsible for the disease has

    been identifiedbeen identified

  • 8/8/2019 Into to Bioinfo

    44/53

    4444

    Drug DesignDrug Design

    Can go up to 15yrs andCan go up to 15yrs and$700million$700million

    One of the goals ofOne of the goals of

    bioinformatics is to reduce thebioinformatics is to reduce thetime and cost involved with it.time and cost involved with it.

    The processThe process

    DiscoveryDiscovery Computational methods canComputational methods can

    improves thisimproves this

    TestingTesting

  • 8/8/2019 Into to Bioinfo

    45/53

    4545

    DiscoveryDiscovery

    Target identificationTarget identification Identifying the molecule on which theIdentifying the molecule on which the

    germs relies for its survivalgerms relies for its survival

    Then we develop another moleculeThen we develop another moleculei.e. drug which will bind to the targeti.e. drug which will bind to the target

    So the germ will not be able to interactSo the germ will not be able to interactwith the target.with the target.

    Proteins are the most common targetsProteins are the most common targets

  • 8/8/2019 Into to Bioinfo

    46/53

    4646

    DiscoveryDiscovery

    For example HIV produces HIVFor example HIV produces HIVprotease which is a protein andprotease which is a protein andwhich in turn eat other proteinswhich in turn eat other proteins

    This HIV protease has anThis HIV protease has an activeactivesitesite where it binds to otherwhere it binds to othermoleculesmolecules

    So HIV drug will go and bindSo HIV drug will go and bindwith that active sitewith that active site

  • 8/8/2019 Into to Bioinfo

    47/53

    4747

    DiscoveryDiscovery

    Lead compounds are theLead compounds are themolecules that go and bind tomolecules that go and bind tothe target proteins active sitethe target proteins active site

    Traditionally this has been a trialTraditionally this has been a trialand error methodand error method

    Now this is being moved into theNow this is being moved into the

    realm of computersrealm of computers

  • 8/8/2019 Into to Bioinfo

    48/53

    4848

    Related ComputerRelated Computer

    Technology.Technology.

  • 8/8/2019 Into to Bioinfo

    49/53

    4949

    PERLPERL

    Perl is commonly used forPerl is commonly used forbioinformatics calculations as itsbioinformatics calculations as itsability to manipulate characterability to manipulate charactersymbolssymbols

    The default CGI languageThe default CGI language

    It started out as a scripting languageIt started out as a scripting languagebut has become a fully fledgedbut has become a fully fledged

    languagelanguage IT has everything now, even webIT has everything now, even web

    service supportservice support

    http://bio.perl.orghttp://bio.perl.org

  • 8/8/2019 Into to Bioinfo

    50/53

    5050

    The place of XML & WebThe place of XML & WebServicesServices Various markup languages are being createdVarious markup languages are being created

    Gene Markup language etc to representGene Markup language etc to representsequence/gene datasequence/gene data

    Web ServicesWeb Services program to program interaction,program to program interaction,making the web application centric as opposed tomaking the web application centric as opposed tohuman centrichuman centric

    So this has to platform language independentSo this has to platform language independent Protocols like SOAP help in this regardProtocols like SOAP help in this regard In bioinformatics various databases are being used,In bioinformatics various databases are being used,

    different platforms, languages etcdifferent platforms, languages etc So web services helps achieve platformSo web services helps achieve platform

    independence and program interactionindependence and program interaction Since sequence data bases are in various formats,Since sequence data bases are in various formats,

    platforms SOAP also helps in this regardsplatforms SOAP also helps in this regards

  • 8/8/2019 Into to Bioinfo

    51/53

    5151

    Data bases and MiningData bases and Mining

    Lot of the sequence databases areLot of the sequence databases areavailable publiclyavailable publicly

    As there is a DB involved variousAs there is a DB involved various

    data mining techniques are used todata mining techniques are used topull the data outpull the data out

    As there is a lot of literatureAs there is a lot of literature articlesarticlesetcetc on this area a data mining onon this area a data mining onthe literature.the literature.

  • 8/8/2019 Into to Bioinfo

    52/53

    5252

    European Molecular BiologyEuropean Molecular BiologyNetwork (EMBnet)Network (EMBnet)

    A central system for sharing, trainingA central system for sharing, trainingand centralizing up to date bio infoand centralizing up to date bio info

    Some of the EMBnet sites are:Some of the EMBnet sites are:

    SQENETSQENET http://www.seqnet.dl.ac.ukhttp://www.seqnet.dl.ac.uk

    UCLUCL http://www.biochem.ucl.ac.uk/bsm/dbbrohttp://www.biochem.ucl.ac.uk/bsm/dbbro

    wser/embnet/wser/embnet/ EBIEBI European BioinformaticsEuropean Bioinformatics

    InstituteInstitute www.ebi.ac.ukwww.ebi.ac.uk

  • 8/8/2019 Into to Bioinfo

    53/53

    5353

    ReferencesReferences

    Dan E. Krane and Michael L. RaymerDan E. Krane and Michael L. Raymer

    Basic Concepts of BioinformaticsBasic Concepts of Bioinformatics

    Arthur M LeskArthur M Lesk

    Intro to BioinformaticsIntro to Bioinformatics T.K. Attwood & D. J. ParryT.K. Attwood & D. J. Parry--SmithSmith

    Intro to BioinformaticsIntro to Bioinformatics

    The genetic RevolutionThe genetic Revolution

    Dr Patrick DixonDr Patrick Dixon

    ProfDavid Gilberts SiteProfDavid Gilberts Site

    http://www.brc.dcs.gla.ac.uk/~drg/http://www.brc.dcs.gla.ac.uk/~drg/