challenges for computer science as a part of systems biology
DESCRIPTION
Challenges for computer science as a part of Systems Biology. Benno Schwikowski Institute for Systems Biology Seattle, WA. Towards integrative models. Species. Conditions/time. Genes. DNA Sequence Genomic locus Domain content Intron/exon structure Regulatory motifs - PowerPoint PPT PresentationTRANSCRIPT
Challenges for computer scienceas a part of Systems Biology
Benno SchwikowskiInstitute for Systems Biology
Seattle, WA
Benno SchwikowskiMath and Computer Science
Challenges
Species
Conditions/time
Genes
Towards integrative models
Proteininteraction- Interaction partner- Direct/indirect- Affinity- Effect
DNA- Sequence- Genomic locus- Domain content- Intron/exon structure- Regulatory motifs- Chemical modifications - SNPs - Splice variants- Accessibility- Variation
mRNA- Abundance- Regulatory information- initiation/ termination signals
Protein- Abundance- State- Localization- 3D structure- Functional characterization- Half-life- Active sites- Biochemical function- Cellular role
Benno SchwikowskiMath and Computer Science
Challenges
Challenge: Integrative models
…Across genes and proteins: Many genes involved (e.g., multifactorial diseases)
• …Across model systems: Lack of experimental platforms in target system
• …Across levels of biological organization(e.g. gene regulatory processes involving phosphorylation)
• …Across experiments: Robustness against errors in mass spectrometry, mRNA measurements
• …Across timescales
Benno SchwikowskiMath and Computer Science
Challenges
DNARNA
ProteinsModules
OrganellesCells
OrgansIndividuals
PopulationsEcologies
Challenge: Capturing evolutionary constraints
"Nothing in biology makes sense except in the light of evolution.“Theodosius Dobzhansky
Challenge: Which tools and experiments to use
Benno SchwikowskiMath and Computer Science
Challenges
Challenge: Choosing experiments
• Machine LearningDetermine most likely classification/parameterization on the basis of a randomly sampled dataset
• Active LearningAllow an algorithm to query selected data points, using the result of previous queries.
Benno SchwikowskiMath and Computer Science
Challenges
Challenge: Relations between system variables can be quite
complex
Yuh, Bolouri, Davidson, Science, 1998
Benno SchwikowskiMath and Computer Science
Challenges
Challenge: Relations between system variables can be quite
complex
Yuh, Bolouri, Davidson,
Science, 1998
Benno SchwikowskiMath and Computer Science
Challenges
Challenge: Develop models that allow extremely efficient algorithms
AGTCGTACGTGAC...
AGTAGACGTGCCG...
ACGTGAGATACGT...
GAACGGAGTACGT...
TCGTGACGGTGAT...
Benno SchwikowskiMath and Computer Science
Challenges
CLUSTALW(1.74) multiple sequence alignment
Cotton ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATTPea GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACATobacco TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACCIce-plant TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACCTurnip ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGCWheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAADuckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAALarch TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC
Cotton CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----APea C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------ATobacco AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGAIce-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAATurnip CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------AWheat GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC--------Duckweed ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATTLarch TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA
Cotton ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTAPea GGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTATobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATGIce-plant GGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGGTurnip CACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATAWheat CACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTGDuckweed TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATCLarch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA
Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTACPea TATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAACTobacco CATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAAIce-plant TCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTACLarch TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCATurnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAGWheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCCDuckweed CATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG
Benno SchwikowskiMath and Computer Science
Challenges
Challenge: Developing models that allow extremely efficient algorithms
Parsimony score: 1
AGTCGTACGTGAC...
AGTAGACGTGCCG...
ACGTGAGATACGT...
GAACGGAGTACGT...
TCGTGACGGTGAT...ACGGACGT
ACGT
ACGT
J. Comp Biol. 2002
Benno SchwikowskiMath and Computer Science
Challenges
An Exact Algorithm(generalizing Sankoff and Rousseau 1975)
Wu [s] = best parsimony score for subtree rooted at node u,
if u is labeled with string s.
AGTCGTACGTG
ACGGGACGTGC
ACGTGAGATAC
GAACGGAGTAC
TCGTGACGGTG
… ACGG: 2 ACGT: 1 ...
… ACGG: 0 ACGT: 2...
… ACGG: 1 ACGT: 1 ...
…
ACGG: + ACGT: 0
...
… ACGG: 1 ACGT: 0 ...
4k entries
… ACGG: 0 ACGT: + ...
…ACGG: ACGT :0...…ACGG:ACGT :0...…ACGG: ACGT :0 ...
Wu [s] = min ( Wv [t] + d(s, t) ) v: child t of u
J. Comp Biol. 2002
Benno SchwikowskiMath and Computer Science
Challenges
What are good challenges to tackle?
• Biological/medical questions asked• Experimental technologies to acquire a lot
of relevant data• Available datasets with a formalized
notion of “data quality”
Benno SchwikowskiMath and Computer Science
Challenges
Memory complexity: O(k 42k ) per node
Number of species
Average sequence
length
Motif length
Time complexity: Total time O(n k (42k + l ))
J. Comp Biol. 2002
Technology-based challenges:Universal DNA Tag Systems
Existing applications in high-throughput
technologies
• Universal DNA arrays
• Padlock probes
• LYNX mRNA technology
Formalization
Define: weight(A/T)=1, weight(C/G)=2weight(AACTTG) = 1+1+2+1+1+2 = 8 melting temperature (AACTTG) =
2·weight
l-u code problemGiven two integers, l < u, find the largestset of tags such that
Each tag has weight u Each string of weight l occurs at most once
J. Comp Biol. 2000 & 2003
Benno SchwikowskiMath and Computer Science
Challenges
Challenge: Visualization
Andrea Weston et al.@ ISB & Cytoscape
Benno SchwikowskiMath and Computer Science
Challenges
Challenge: Visualization
Cytoscape, pre-release 2.0
Benno SchwikowskiMath and Computer Science
Challenges
A computer scientist’s perspective
“Biology is so digital, and incredibly complicated […] I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level.”
Donald Knuth, 7 Dec 1993
Donald Knuth