dna other cells/ organisms mutaon mang€¦ · other cells/ organisms other environmental factors...
TRANSCRIPT
Ma#ng
Muta#on
DNAMigra#on
ACell/Organism
OtherCells/
Organisms
OtherEnvironmental
Factors
Cell‐cellsignals
Changegenome Changeenvironment
Produce
Remediate
Sequester
Support
Cure
Ecosystem/Tissue
Measure/manipulatehere
GoalsandOutcomes
Scien#ficGoals
Techno
logies
ImprovedAnnota#onHigh‐throughputPhysiology/Gene#cs
Frommacromolecularstructure
NonlinearOp#miza#onMul#variateSensi#vityAnalysis
Sta#s#calGraphModelingAlgorithmsSemi‐automatedmodelgenera#onScalablemethodsforhybrid/mul#scalesim.
Bifurca#onanalysisforlarge/mul#physicssys.IntegratedsoMwaresystemsExperimentalDesignSupport
ExascaleResources
CloudCompu#ng
TransparentresourceuseRapidDetermina#onofGenomicPoten#al
Environmentalchange,growth,andevolu#on
Remediate,sequester,produce,cure
Automatedmodelreduc#on(abstrac#on)
GrandChallenges
• ModelingPhenotypefromGenotypeAcrosstheTreeofLife.– Rapidreconstruc#onofcellularnetworksacrossmul#ple#meandspace
scales.• Predic#vemodelsof10,000organisms• Mechanis#cunderstandingofsinglecelldynamics
– Predictenvironmental“inputs”andoutputs:“crackthesignalingcode”*– Predictop#malgrowthcondi#onsfromgenotype– Op#mizenaturalsystemac#vityofindividualsandconsor#a– Individualizedgenomics
• DiscoverandEngineerUnitsofFunc#onatallNetworkScales– Discovertheevolu#onaryprinciplesoffunc#ongenera#onandreuse– Designnewfunc#onfromcomposi#onofsuchmodules
*Thanks,Mark
ExampleStories:ChrisHenry(andCostasMaranas)
!""#$%&#'$#()%*&+,-
.#/0&)(*.1'2#(3*.4'5*.'67'
"0%8&+"'%)"#9'*2',0..#2('
)22*()8*2"-9)()'
:#.5*.$';<+82='
"(1&#'#>?#.+$#2("'
(*'50.(@#.'#&0,+9)(#'
.#/0&)(*.1'2#(3*.4'
A*2"(.0,(')'B6!'
$*9#&'*5'67'"0%8&+"'
(@)('+2(#/.)(#"'
%**&#)2'.#/0&)()8*2'
!99'(@#.$*912)$+,'
)29'4+2#8,',*2"(.)+2("'
(*'(@#'B6!'$*9#&'3+(@'
?).)$#(#.'02,#.()+2(1'
C"#'(@+"'$*9#&'(*'+9#2851'
(@#'$+2+$)&'9#*2"'
.#D0+.#9'(*'(0.2'*E'#F#.1'
2*2#""#28)&'?)(@3)1'
G20$#.)(#'(@#'&+2#).&1'+29#?#29#2('
"*&08*2"'(*'(@+"'?.*%&#$'(*'
?.*90,#')'&+"('*5'502,8*2)&&1'
#D0+F)('?)(@3)1'$*90&#"'
G>?&*+('67'"0%8&+"'2)(0.)&',*$?#(#2,#'
(*'#>?#.+$#2()&&1'+$?&#$#2('*2#'
"*&08*2')"'$0,@')"'?*""+%&#'3-*'
F+)%+&+(1'&*""'HIJK'9#*2"L'
C"#'$*9#&'(*'9#"+/2')'"#('*5'
?@#2*(1?+,'(#"("'5*.'*0.';$+2+$)&='
"(.)+2''(@)('3+&&'$)>+$+M#'(@#'
*??*.(02+(1'(*'+2F)&+9)(#'*0.'$*9#&'
N02'?@#2*(1?+,'(#"("'
*2';$+2+$)&='"(.)+2'
)29',*$?).#'.#"0&("'
3+(@'$*9#&'?.#9+,8*2"'
O*9+51'$*9#&'(*'
$)>+$+M#')/.##$#2('
%#(3##2'?.#9+,8*2"')29'
#>?#.+$#2()&'.#"0&("'
!"#"$%&%'()&*+*,"&(-#,"-"./.0&$&."1&-(.(-$,&)%2$(.&"$*'&/-"&
$.3&$))"-4,(.0&$&)"%&56&#$%'1$+&-537,")&%'$%&-$+&4"&
2"$*/8$%"3&$%&1(,,&,$%"2&&
95%":&1"&$2"&
9;<&-(.(-(=(.0&
0".5-"&47%&
>.5*>(.0&57%&
?42$.*'@(.%)&
(.&%'"&#$%'1$+)&
Desktop‐scalecomputa#onalstepsinredMassivescalecomputa#onsingreenExperimentalstepsinblueStepsrequiringnewalgorithmsinblack
Scien:ficobjec:ve:Developa“complete”understandingofthemetabolismofB.sub'lisincludingiden#fica#onofalltransporters,theregula#onofallmetabolicpathways,andthepathwayresponsetotheenvironment.
Whatswitchesonspores?
X
X
AR (proteins/mRNA-s)
k3(mRNA/s)
pulseoscillationgraded
bistable
X
X
AR (proteins/mRNA-s)
k3(mRNA/s)
pulseoscillationgraded
bistable
!"#
$%#
&'()#
%*+$#
&'(%#
+,-$#
.+/0$1+#
2/3$1+#
!"# $%#
Induc#onofSpo0B
WTlevelsofSpo0B Slightinduc#on Slightlylargerinduc#on
Impact
• Wehaveanopportunitytotrulyunderstandawholelifeformanditsinterac#onwithitsenvironment.
• Wecanplaceitinevolu#onarycontextandbegintounderstandhowitanditspartsarosefromoriginallyinorganiccomponents?
• Wecandiscoverthemo#fsoffunc#onthatwecanexploitforhumanpurposes
OverallConclusions• Dataiss#llthemainbo^leneckbuttherearebadlyscalingcomputa#onalproblemsintheir
analysis
• Behaviorisresponsivetocomplexcombina#onsofinputsandcombinatoriallyaffectedbylargenumbersofgenesthusexplora#onofmodelsortes#ngagainstdataremainsdifficult.
• Forthisefforttohaveaprofoundimpactonbiologyweneedhighthroughput,easyuse,andfastconcisepresenta'onofdata,model,andcomparison.
• Whilethereisfundamentaltheoryandalgorithmsthatneedtobedeveloped–andiscri'cal–thetransparentaccesstolargecomputa#onresourcesforalargenumberofusersisthekey.Mosttoolswillbeembarrassinglyparallelorclosebutwills#llrequireoutrageousnumbersofprocessorsforeachproject.Andtherewillbemanyusers.
• Sadly–experimentsseemtoscalealongsidecomputa#on–thelargerandmorecomplexthemodelthemoreexperimentsareneededtotest.– Thescalingmaynothavenearthesameexponentsbutitisfarfromtrivialin#meandcost– Thedatais“fragile”andneedstobe#ghtlyqualitycontrolled,instantlyaccessible,andwell
annotated
– Computa#onshouldbeaneverydayadjutanttoexperiment–butthecomputa#onalthroughputiscurrentlytooslowandcomplicatedtobeasgenerallyusefulasaknockoutormicroarray.
Annota#on
• Sequencingisscalingsuchthat10k’sofgenomesandevenmoremetagenomicsequencewillbesequenced/unit#me
• Newfamiliesareplateauing,buts#ll30‐50%unannotatedfunc#on.Needstructurepredic#onhelp.Experimentsarekey.– Investments
• Newalgorithmsforphylogene#cannota#onN^2Log(N)– Deeptheoryofevolu#onfortrees.
• Newalgorithmsforstructuralannota#on(Seemacromoleculargroup)• Newalgorithmsfor“guilt‐by‐associa#on”(M*N^2)• Experimentstobroadenfunc#onalassignments(NOrganism*nCond*Nreplic)
• Manualannota#on/cura#on
High‐throughputPhysiology/Gene#cs
• Problemsofscale– Fromsinglemoleculetopopula#onsofcells– Mul#plexesmachinesallowsomewholegenomeassays–butbiochemistryiss#llaproblem,imagingiss#lla
problem(~N^3)andtechlikemassspecitselfcomputa#onalproblemsinspectramatching(~N^2)– Gene#csiss#llcase‐by‐case– Qualityandreproducibility– Centraliza#on/dissemina#onofdataandstandards
• Investments– Determiningmethodsforqualitycontrolwithdatacollec#onwithtechnologieshighdistributedinindividual
laboratories(ratherthanacentralresource).– Investmentincrea#nghigherthroughput/highcontentresourcesforbiochemistry,singlecellandsingle
molecule(andinsitu)imaging/measurement– Investmentsinnewculturingtechnologiesinmicrofluidicstobioreactors– Massivelyparallelcomputa#onMUSTbematchedbymassivelyparallelexperiment.– Increasingcomputa#onintheloopdesignofexperiments
• Mathema#csandCSforDesignofexperiments– Transparentintegra#onofsupercompu#ngandtheHTexperimentlabs.
• Embeddedsystemsengineeringforplacingcontrolsontheinstruments– Visualiza#onofcomplexdatasetsderivedfromthese.– Mathema#csforreduc#onofnonlinearmul#variatedatasets.– Imageprocessing/segmenta#on‐– Capturereagents(totargetmoleculesformeasurement)– Experimentsthatseektodiscoverthecommunica#onmechanismsandmoleculesbetweencells.
Sta#s#calGraphModelingAlgorithms
• Theoryfordeterminingbestarrayofexperimentstoinputintothemodel– Whichmodali#es– Whichcondi#onsandtheircombina#ons– Which#mescales– Howmanyreplicates
• Currently–biclustergoesasN^2/N^3andpropaga#onanduncertainandbranch/boundmodelreduc#ongoesasExp(cNedges)wherecsmallconstant.
• Investments– Newtheoryfordataintegra#onandsta#s#calmodeling– Validatedtestsetsforalgorithmcomparison– Mechanismsfordataavailability– Detec#onweaksignalsinhighbackgroundnoise(notsubgraphisomorphism)–
sequences,correla#ons,quan#fica#ons– Networkmo#fdetec#on– Automatedtextmining/naturallanguageprocessing– Improvethefeedbackloopbetweensta#s#calassocia#onsandannota#on
• Be^eruseofmolecularinterac#ondataandsequence/TFinforma#on
Semi‐automatedmodelgenera#on• DataDriventoDynamicModels
– Howdoweusebioinforma#canalysisofnewdatasetstobuildamodel?– Howdoweforce“annota#on”ofmorebiochemicalfeaturesthatwouldaidmodelers.– Metabolicreconstruc#onthroughannota#on
• Valida#on• Feedbackthroughinconsistencyandholes
– Rapidupdateofmodelswhendataupdates• Linktopapers
• Ataxonomyofmodeltypesthatwouldallowyoutraversewhatkindofmodeltogeneratefromdata.
• Establishmentofstandardsfordifferenttypesofmodel– Whatisa“complete”model?
• Howdoyouintegratemodelsofdifferentpartsofalargersystem– Toolsformakingthiseasier.– Controlledvocab,seman#csandagreementwiththebioinforma#csfolkswithnaming
• We’dliketo–wehavenoclue?• MaybeaccessibilitytotheSta#s#calmodels
scalablemethodsforhybrid/mul#scalesim.
• Tounderstandhowthe1Dgenomeistransformedinto3Dspace.• CurrentPDEsimula#onsscaleasO(N)but…• Needforusablelibraryandmodelingframeworksthatbiologistscanuseeasily.• Dothiswithhighheterogeneityinthephysics(stochas#c/determinis#c)andspace.• Dothisunderuncertaintyindata,parametersand(gasp)mechanism• Languagesformodelrepresenta#ons:Formallanguages?(Languagesforsimula#oncontrol).
• Investments– Howdosmoothlytransi#onamonglevelsofphysicalabstrac#onfromfullyrenderedspa#al/molecularlevel
simula#on,throughmesoscalestochas#csallwaytosmoothedODE.– Howdoyoucomparespa#al/temporaldataandmodelsinphylogene#ccontext.– Howdoyouputtogethergene#c/genomicdataandspa#almodeling.– Howdoyoubuildcomplexheterogeneousmodels?– Howdoyoubuildhybridstochas#c/determinis#cmodels/sta#s#cal– Newintegrators(likeMagnusExpansions,spectral)tosupportaccurate– Measurementtechnologyofbiochemicalac#vity(andforces,etc.)inlivecells.
Automatedmodelreduc#on
• Formalcoarsegraining,scalesepara#on,and“balancedtrunca#on”– RonaldCoifman?
• Automatednondimensionaliza#on?• Linearalgebraicsolu#onsfortheabove.– Pseudospectralmethods?– Regulariza#on
• Responsesurfacemethodsandfunc#onalapproxima#on.
NonlinearOp#miza#on
• Parameteres#ma#onandModelTes#ng– Determina#onoftheerrorboundsonthemodel– Parameterfeasibilityregions
• Changesinregionsuponcomposi#ngofsubmodels– Theoryofmoduleimpendanceandretroac#vity
– ModelInvalida#on• Inconsistency
– Datainvalida#on– Algorithmsforglobalop#miza#onbycleverparametermo#on
• Quasirandomsearch– Howdowegettothetailsofthedistribu#onsofour“stochas#c”models.
• Measuresofparameterconserva#onbetweenorganismsasaresultandproxyforevolu#onarypressure.
• ModelSelec#on/ModelModifica#on–minimalnumberofmovestoexplainthemaximumamountofdata.(Exp(N)butprac#callyN^2toN^3).
• Mul#objec#veop#miza#on• IntegerOp#miza#on
Mul#variateSensi#vityAnalysis
• Differencemethodsfailforstochas#csystems– Newalgorithm– Petzold&Doyle?
• Highdimensionalsensi#vity– Sethnaandsloppiness.
• Eigendirec#ons.– DavidRand(Warwick)– FAST(Fouriersomethingorsomething)Saltelli– DeniseKirchner
• Understandingwheretodoyourexperiments• Howdowelinksensi#vitytoevolvability?
Bifurca#onanalysisforlarge/mul#physicssys.
• Auto,Oscill8,MatLab(CONTENT)– Crashesaround25differen#alequa#ons
• Howdoesbifurca#onchangewithcomposedmodels?– Highdimensional(co‐dim3andhigher)
• Classifica#onofdynamicsofamodel– Rapidinferenceofpossibilityforbifurca#onfrommodels.– ChemicalReac#onNetworkTheory(Feinberg)– Differentphysicalmodelclasses.
• Fullydiscreteandstochas#cmethods– Dodeterminis#cbifurca#onssurviverealis#cnoise– Donewbifurca#onsariseduenoise.
• Mapparameterstogenotypedeterminewhichgenotypesleadtobifurca#on.
ExperimentalDesignSupport
• Op#mizechoiceofdatatoanswerques#onbasedonmodel– E.g.bestdatatoes#mateparameters– E.g.bestdatatodiscriminatebetweenmodels– E.g.bestdatatodiscovermissingpiecesofmodel– E.g.bestdatatoreducetheuncertaintyinmodelpredic#on.
– Tidor– Arkin/Flaherty
• Theautomated“Adam”–thegene#cist.• Issuesofresolu#on
– Timeresolu#on– Rela#vevs.absolutemeasurement– Measuresofconstraint?– Whatdatadoyouneedtodealwithmul#scalemodels
• ClosedloopcontrolofBiology– ThinkHerschelRabitz
Italldependsonwhatthemeaningof“Mission”is.
Yes,wecan!