cracking the (bio)code -- professional development session at sacnas 2014
TRANSCRIPT
![Page 1: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/1.jpg)
Cracking the (bio)code Resources for research careers in computational biology & bioinformatics
Felipe Zapata, PhDBrown University@zapata_f
Conner Sandefur, PhDUniv. North Carolina @oshehoma
Emilia Huerta-Sanchez, PhDUniv. California, Merced @emiliahsc
Tracy Heath, PhDIowa State Univ.@trayc7
Visit our website: crackingthebiocode.github.io● Information about the session● Resources for learning to program: workshops, online courses, tutorials, etc.● Links to many degree programs in the U.S. for studying computational
biology/bioinformatics● Profiles of computational biologists and bioinformaticians
![Page 2: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/2.jpg)
How small changes can make a big difference Bioinformatics @UNC-Pembroke Investigating how changes in gene
expression drive system-wide behaviorComputational Biology @UNC-Chapel HillPredicting therapies to improve mucus clearance in cystic
fibrosis (CF) and chronic obstructive pulmonary disease (COPD) 1 hr 24 hrs
-4 0 4
Tools I use:
![Page 3: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/3.jpg)
Dr. Conner I. SandefurSPIRE Postdoctoral Scholar at UNC-CHVisiting Assistant Professor at UCNP
PhD BioinformaticsUniversity of Michigan Ann Arbor, Michigan
BA Computer Science George Washington UniversityWashington, DC
email: [email protected]: http://www.unc.edu/~sandefurtwitter: @oshehoma
![Page 4: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/4.jpg)
What is the evolutionary history of species?Using transcriptomes and genomes to
resolve ancient animal radiationsPhylogeny of snails, slugs, and relatives
What genes are homologous?Using graph-based approaches to infer homology
Gene clusters inferred to be the “same” gene family across multiple species
AGALMA: https://bitbucket.org/caseywdunn/agalmaBitBucket (Git)
![Page 5: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/5.jpg)
Dr. Felipe ZapataPostdoctoral Research AssociateBrown University
COLOMBIA
email: [email protected]: http://felipezapata.metwitter: @zapata_f
PhD Ecology, Evolution & SystematicsUniversity of Missouri-St. Louis St. Louis, Missouri
BSc Biology Universidad de Los AndesBogotá, Colombia
![Page 6: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/6.jpg)
What does genetics tell us about human history?
![Page 7: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/7.jpg)
Dr. Emilia Huerta SanchezAssistant ProfessorUC Merced
email: [email protected]: http://www.stat.berkeley.edu/~emiliahstwitter: @emiliahsc
Postdoc in Integrative Biology and Statistics, UC Berkeley, Berkeley, CA
PhD Applied MathematicsCornell University, Ithaca, NY
BA Mathematics & FrenchMills College, Oakland, CA
![Page 8: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/8.jpg)
Modeling macro- & molecular evolutionary processes to infer phylogenetic relationships
● How have rates of molecular and morphological
evolution changed across the tree of life?
● How do patterns of fossilization, preservation, and
recovery change across different taxa?
● Can we detect relationships between geological
events and species diversification?
● What are the evolutionary processes acting on
different regions of the genome and how have those
factors shaped the evolution of different genes?
C++RevBayes
Probabilistic graphical models
![Page 9: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/9.jpg)
Dr. Tracy A. HeathAssistant Professor (Jan. 2015)Iowa State University
email: [email protected]: phyloworks.orgtwitter: @trayc7
Postdoctoral FellowU. Kansas & U.C. Berkeley
PhD Ecology, Evolution & BehaviorUniversity of Texas at Austin
BA Biology Boston University
![Page 10: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/10.jpg)
What is Computational Biology?
What is Bioinformatics?
http://crackingthebiocode.github.io/
![Page 11: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/11.jpg)
Modeling infectious disease transmission
![Page 12: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/12.jpg)
Compartmental models are one type of mathematical model used to investigate the spread of infectious disease
Rate of infectionRate of recovery
Change in proportion of Susceptible (S) people over time = - Susceptible (S) X Infected (I) X β
Susceptible Infected Recovered
=
![Page 13: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/13.jpg)
Infection dynamics for different diseases can be simulated by selecting appropriate parameters
![Page 14: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/14.jpg)
We can use models to predict how interventions change disease transmission dynamics
Infection dynamics with R0 = 2
Infection dynamics after intervention at day 10, which reduced R
0 to 0.8
R0 > 1, infection peaks then disappears R
0 < 1, infection dies out
Simulations run in Python 3.4 (downloaded as part of Anaconda package: http://continuum.io/downloads)
![Page 15: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/15.jpg)
Agalma: automated and reproducible phylogenetic
analyses
![Page 16: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/16.jpg)
From…a few key genes (e.g. 16S RNA, mitochondria, chloroplasts)across many species
To…High-Throughput Sequencing of 1000s of genes across many species
genes
spec
ies
spec
ies
genes
Phylogenetics
![Page 17: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/17.jpg)
Challenges to phylogenetics• Many steps
• Many programs must be used together
• Computationally intensive
• Difficult to reproduce
![Page 18: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/18.jpg)
Challenges to phylogenetics• Many steps
• Many programs must be used together
• Computationally intensive
• Difficult to reproduce
Automate!
![Page 19: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/19.jpg)
Why automate?• Results are reproducible
• Results can be easily explored and extended
• Methods can be compared in a controlled setting
• Facilitate method development without reinventing
everything
![Page 20: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/20.jpg)
https://bitbucket.org/caseywdunn/agalmaThe tool
![Page 21: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/21.jpg)
The paper
![Page 22: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/22.jpg)
https://bitbucket.org/caseywdunn/dunnhowisonzapata2013/The example analysis
![Page 23: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/23.jpg)
For each transcriptome:• Quality control• Assemble transcriptome • Translate and annotate genes • Quantify gene expression• Put sequences in database
Can also:• Import DNA sequences from national databases (e.g., NCBI)• Process externally produced assemblies
![Page 24: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/24.jpg)
Across transcriptomes (many species):• Identify homologous genes
• Build phylogenies using all genes!
silh
ouet
te im
ages
from
http
://ph
ylop
ic.o
rg/
![Page 25: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/25.jpg)
What tools do you need?
http://crackingthebiocode.github.io/
A biological question
programming skills
statistical modeling
C++
a mathematical model
![Page 26: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/26.jpg)
Questions?
• What programming language should I learn?• How do I get started learning a programming language?• What is the best way to become proficient in a programming language?• What is the difference between C++ and python and java and R and
MatLab and ruby and ...?• What is version control? Do I need to know it?• Do I need a GitHub account?• Where are jobs or degree programs in computational
biology/bioinformatics listed?• What does it mean to be open source? Why is it important?• and ...?
http://crackingthebiocode.github.io/
![Page 27: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/27.jpg)
Take-Home Messages • You don’t have to be an expert programmer to do computational
biology.• Anyone can learn to program, it’s just a matter of getting started.• Computational skills are extremely helpful for streamlining biology
research.• The skills you need to learn depend heavily on you background and
your research interests. • Quantitative skills – a firm understanding of math and statistics – are
important for any research field.• Don’t be overwhelmed by all there is to know, these skills grow over
time. If you consistently seek to improve them & use them for your work you will be amazed at how your expertise will develop.
http://crackingthebiocode.github.io/
![Page 28: Cracking the (bio)code -- Professional Development Session at SACNAS 2014](https://reader033.vdocuments.us/reader033/viewer/2022042716/55a980431a28ab2a288b4609/html5/thumbnails/28.jpg)
Find out more!http://crackingthebiocode.github.io/profiles.html