uncovering the dynamics of genome function at the ... · ffv1 codec root tip identification...

24
Uncovering the dynamics of genome function at the organismal scale using machine vision and computation Tessa Durham Brooks, Ph.D. Doane College Department of Biology

Upload: others

Post on 16-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Uncovering the dynamics of genome function at the organismal scale using machine vision and computation

Tessa Durham Brooks, Ph.D. Doane College

Department of Biology

Page 2: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Anticipation at the dawn of the Genomic Era

“Within the next few years, technologies developed for the Human Genome Project and similar sequencing efforts will revolutionize medicine, agriculture, crimefighting, and other fields.” – Gwynne and Page, Science, 2000

Genetronic Replicator – Star Trek: TNG, 1992

Page 3: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

The genomic powerhouses

~20,000 genes 97 mil base pairs Sequence finished 1998

~25,000 genes 100 mil base pairs Sequenced finished 2000

~22,000 genes 137 mil base pairs Sequence finished 2000

For reference: Humans have about 25,000 genes, 3.2 bil base pairs.

Page 4: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

The problem: at best functional roles have been assigned for 15% of predicted genes of a genome

For reference: Humans have about 25,000 genes, 3.2 bil base pairs.

~20,000 genes 97 mil base pairs Sequence finished 1998

~25,000 genes 100 mil base pairs Sequenced finished 2000

~22,000 genes 137 mil base pairs Sequence finished 2000

Page 5: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Why has determining gene function in multicellular organisms been difficult?

Page 6: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Why has determining gene function in multicellular organisms been difficult?

Page 7: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Our goal: Describe genome function at the organismal scale.

Requirements:

• Observations should be made at sufficiently high spatial and temporal resolution

• Methods should be relatively high-throughput to allow genomic survey

• Observations should be able to be made over time and in many environmental contexts

~25,000 genes

Page 8: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Root gravitropism: a model for image analysis approaches in functional genomics

Page 9: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Root gravitropism: a model for image analysis approaches in functional genomics

0 2 4 6 8 10

0

20

40

60

80

100

glr3.3-1

glr3.3-2

SalkColT

ip A

ngle

(deg

)

Time (h)

First Order (swing rate)

Scal

e Sc

ale

Second Order (acceleration)

Time (h)

glr3.3-1 vs wt

glr3.3-2 vs wt

Miller, Durham Brooks, and Spalding 2010, Genetics

Page 10: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Developing tools to detect genome function at the organismal scale.

Requirements:

Observations should be made at sufficiently high spatial and temporal resolution

• Methods should be relatively high-throughput to allow genomic survey

• Observations should be able to be made over time and in many environmental contexts

~25,000 genes

Page 11: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Automation and High Throughput ver. 1

Page 12: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Automation and High Throughput ver. 2 Doane Phytomorph

Page 13: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

• Recombinant inbred lines (RILs)

• Ecotypes (e.g. 1001 genomes project)

High throughput genetic stocks

Matthieu Reymond Max-Planck Institute

S T S T

QTL Analysis

Page 14: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Developing tools to detect genome function at the organismal scale.

Requirements:

Observations should be made at sufficiently high spatial and temporal resolution

Method should be relatively high-throughput to allow genomic survey

• Observations should be able to be made over time and in many environmental contexts

~25,000 genes

Page 15: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Phenotypes are plastic

Durham Brooks, Miller, and Spalding 2010, Plant Physiology

One ecotype

Page 16: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Po

siti

on

(cM

)

Time (minutes)

LOD

Genomics analysis in a multi-dimensional condition space

Seed Size

Small Large

See

dlin

g A

ge

2d 164 lines X

15 indiv.

3d

4d

Moore, et al., unpublished result

Page 17: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Developing tools to detect genome function at the organismal scale.

Requirements:

Observations should be made at sufficiently high resolution

Method should be relatively high-throughput to allow genomic survey

Observations should be able to be made over time and in many environmental contexts

• Cyberinfrastructure must facilitate the above

~25,000 genes

Page 18: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Workflow

Data Capture

Data Capture

Data Storage (30 TB)

Data Compression and Analysis

Feature Extraction

Data Storage (X TB)

Schorr Center and OSG

0.5 TB/day

Data Compression

QTL analysis

Page 19: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Data Compression

Scan Root Response

•Time Series of 220 uncompressed TIF’s •~225 MB each

Image Compression

(Condor on OSG)

•Time series of lossless -compressed PNG’s •~195 MB each

Auto Crop & Time Stamp

(Condor on OSG)

•Equalize dimensions •Insert the time stamp into the least significant bits of the first 14 pixels of each image

Video Compression (Local Grid)

• Compress to video using FFV1 codec •Uses a lossless intraframe codec •~160 MB/frame

Ship out to UW

Page 20: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Feature Extraction

Compressed Data

•Decompress Data •Currently using FFV1 codec

Root tip Identification

•Isolate and track root tip •Currently image curvature features are used

Machine Learning

•Isolate plant’s Arial and ground tissue from image background •Currently Bayesian and SVM methods work well

Tip angle Measurement

• Linear regression on the root’s meristematic tissue Ship out to Doane

Page 21: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

QTL Analysis - Association of a phenotypic

value (e.g. root tip angle) with a genetic element

Compile Tip Angle Data

•Find mean tip angle at each time point for each genetic line

QTL Detection (Local Grid)

•Choose detection method •Optimize maximum likelihoods of trait data •Determine significance (LOD)

Determine Significance Threshold

(Condor on OSG)

•Permutation testing •Haley Knott or Multiple imputation (0.24 vs 63 CPU years per time point) •Determine max likelihood and LOD score of randomized data

Additional analysis

•Interactions, additive effects •Plasticity QTL •Conditional QTL (GxE effects)

Ship out to Doane

Ship out to Doane •Launch analysis locally •Bleed out to OSG

Page 22: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Progress and Future Directions

• One small college has collected over 14,500 individual root gravitropic responses in six conditions (32 TB) in RIL population in 6 mo.

• We will finish collection from NILs (near-isogenic lines) - an additional 8,700 individuals, 19 TB in 1-2 mo.

• Begin image analysis and QTL analysis – dataset opens new doors in visualizing genomes

• Interdisciplinary undergraduate training opportunities

Page 23: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Acknowledgments Doane College

Mike Carpenter (CIO), David Andersen, Dan Schneider Chris Wentworth (Physics) and Alec Engebretson (IST) Students: Amy Craig and Brad Higgins (Physics), Autumn Longo and Grant Dewey (Biochemistry), Tracy Guy, Miles Mayer, Halie Smith, Anthony Bieck, Sarah Merithew, Devon Niewohner, Muijj Ghani, Julie Wurdeman (Biology)

University of Wisconsin Edgar Spalding’s Laboratory: Nathan Miller, Candace Moore, Logan Johnson Miron Livny (CHTC)

UNL – Schorr Center and HCC David Swanson, Brian Bockelman, Carl Lundstedt

University of Florida Mark Settles

Computing Resources

Funding

PGRP - 1031416 EPSCoR - URE NE-INBRE

Page 24: Uncovering the dynamics of genome function at the ... · FFV1 codec Root tip Identification •Isolate and track root tip •Currently image curvature features are used Machine Learning

Questions?