machine learning methods: applications in biology and

Machine Learning Methods:Applications in Biology and ComputerVisionMario Banuelos, California State University, Fresno7 mbanuelos22 � www.mbgmath.com R mbanuelos22@csufresno.edu

Outline

1 Introduction & Background

2 Neural Networks and Computer Vision

3 Detecting Genomic Variation

4 Conclusions

M.Banuelos (Fresno State) mbanuelos22@csufresno.edu February 18, 2019

Introduction & Background

Applications of Machine Learning

Self-driving cars

Voice Assistants

Object Classification/Detection

Self-driving cars

Voice Assistants

Self-driving cars

Voice Assistants

Artificial Intelligence vs. Machine Learning

Machine LearningMachine learning is the scientific study of algorithms and statistical modelsthat computers use to understand data.

Artificial Intelligence vs. Machine Learning

Machine LearningMachine learning is the scientific study of algorithms and statistical modelsthat computers use to understand data.

Simple Linear Regression

I Models the relationship between 2 variables, X and Y , a predictorand a quantitative response by assuming a linear relationship

E (Y |X = x) = β0 + β1X

I The goal is to find parameters for our equation (β0, β1) such that weminimize the error between data and line.

I Coefficients β0 and β1 are computed using least squares, resulting inthe OLS line.

Linear Regression (cont.)

I We begin by choosing a metric to compare our predictions yi to datayi. This loss, error, or cost function may be defined as

L(β0, β1) =∑i

(yi − yi)2

(yi − β0 − β1xi)2

I Then we find the βs, by solving

∂L(β0, β1)β0

= −2n∑i=1

(yi − β0 − β1xi) = 0

∂L(β0, β1)β1

= −2n∑i=1

xi(yi − β0 − β1xi) = 0.

Linear Regression (cont.)

I We begin by choosing a metric to compare our predictions yi to datayi. This loss, error, or cost function may be defined as

L(β0, β1) =∑i

(yi − yi)2

(yi − β0 − β1xi)2

I Then we find the βs, by solving

∂L(β0, β1)β0

= −2n∑i=1

(yi − β0 − β1xi) = 0

∂L(β0, β1)β1

= −2n∑i=1

xi(yi − β0 − β1xi) = 0.

OR Gradient DescentWe can also update our guess for βold

0 , with learning step η, as

βnew0 = βold

0 − η(∂L(β0, β1)

βold0

OR Gradient DescentWe can also update our guess for βold

0 , with learning step η, as

βnew0 = βold

0 − η(∂L(β0, β1)

βold0

Moving to Classification

Assume we want to classify the following

How would you separate the headphones from the glasses?

Logistic Regression

A few key differences . . .I Loss function becomes L = − (y log(p) + (1− y) log(1− p)), wherep is the predicted probability of being in a class.

Logistic Regression

A few key differences . . .I Loss function becomes L = − (y log(p) + (1− y) log(1− p)), wherep is the predicted probability of being in a class.

Logistic Regression

A few key differences . . .

We use log(

p1−p

)= β0 + β1X where p is referred to as the probability

that y = 1.

Neural Networks and Computer Vision

Logistic Regression as Neural Network

Neural NetworkA class of machine learning algorithms which take input and passes itthrough interconnected nodes to predict a response.

Figure: An example of a perceptron.

The Sigmoid Function

σ(x) = 1(1 + e−x)

Logistic Regression as Neural Network

Neural NetworkA class of machine learning algorithms which take input and passes itthrough interconnected nodes to predict a response.

Figure: An example of a perceptron.

The Sigmoid Function

σ(x) = 1(1 + e−x)

Deep Learning - Neural Networks

DefinitionA class of machine learning algorithms which uses a cascade of multiplelayers of nonlinear processing units (neurons) (i.e., Affine transformationsfollowed by nonlinear transformations).

Note: σ(·) will represent nonlinear transformations.

x1 h11

Inputsx2 h12

ySigmoid(output)

Hidden Layer 1

Hidden Layer 2

Deep Learning - Neural Networks

DefinitionA class of machine learning algorithms which uses a cascade of multiplelayers of nonlinear processing units (neurons) (i.e., Affine transformationsfollowed by nonlinear transformations).

Note: σ(·) will represent nonlinear transformations.

x1 h11

Inputsx2 h12

ySigmoid(output)

Hidden Layer 1

Hidden Layer 2

Applications in the Medical Field

Computer Vision

Figure: MNIST Images Figure: Corrupted MNIST Images

Computer Vision – but harder

Seeing Numbers as a Computer

Figure: Two images displaying that a black and white image is just a matrix ofnumbers between 0 and 255, 0 representing white and 255 representing black.

Question: If we were interested in classifying numbers as either an 8 or a0, how could we incorporate this into a logistic regression model?

Decision Boundaries are often Nonlinear

I Logistic regression will perform worse than a neural network with only1 hidden layer

I These tools are generalizable to natural language processing, biology,societal applications, and many more fields.

I But . . . more mathematicians are needed to design studies andinterpret the results.

Detecting Genomic Variation

Genomic Variation within Species

DNA Sequencing

1. Unknown genome ischosen.

2. Fragment unknowngenome.

3. Ends of fragments aresequenced.

4. Ends aligned toreference genome.

DNA Sequencing

Probabilistic Generative Model of Sequencing

Lander-Waterman Statistics - Genome sequencing can be modeled as apoisson process.

I Assumes independence.

I Assumes number of fragments in a region follows Poisson distribution.

Assume G is the genome length and L the fragment length.

Then, for an interval [x, x+ L],

P (observing one fragment in an interval) = L

Binomial Distribution

What is the probability of drawing exactly one red marble in 2 trials (withreplacement)?

2× P (red)P (not red)

2× P (red)P (not red) = 2× (0.4)(0.6) = 0.48

.M.Banuelos (Fresno State) mbanuelos22@csufresno.edu February 18, 2019

What is the probability of drawing exactly one red marble in n trials?

What is the probability of drawing exactly one red marble in n trials?(n

)× P (red)P (not red)n−1

What is the probability of drawing exactly one red marble in n trials?(n

)× P (red)P (not red)n−1 = n× (0.4)(0.6)n−1

For observing k number of fragments in a fixed interval, with n totalfragments, we have

P (X = k) =(n

)pk(1− p)n−k,

or B(n, p), where p is the probability of success.

For observing k number of fragments in a fixed interval, with n totalfragments, we have

P (X = k) =(n

)pk(1− p)n−k,

or B(n, p), where p is the probability of success. This distribution hasmean np. Let λ = np⇒ p = λ

From Binomial to Poisson

Sequencing involves DNA duplication and therefore requires large n, so

limn→∞

B(n, p) = limn→∞

)pk(1− p)n−k

= limn→∞

)k (1− λ

)n−k= λk lim

n→∞n!

k!(n− k)!

)k (1− λ

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

)n (1− λ

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

)n︸︷︷︸→e−λ

(1− λ

)−k︸︷︷︸

k! e−λ = Poisson(λ)

limn→∞

)pk(1− p)n−k

= limn→∞

)k (1− λ

)n−k

= λk limn→∞

n!k!(n− k)!

)k (1− λ

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

)n (1− λ

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

)n︸︷︷︸→e−λ

(1− λ

)−k︸︷︷︸

limn→∞

)pk(1− p)n−k

= limn→∞

)k (1− λ

)n−k= λk lim

n→∞n!

k!(n− k)!

)k (1− λ

)n−k

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

)n (1− λ

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

)n︸︷︷︸→e−λ

(1− λ

)−k︸︷︷︸

limn→∞

)pk(1− p)n−k

= limn→∞

)k (1− λ

)n−k= λk lim

n→∞n!

k!(n− k)!

)k (1− λ

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

)n (1− λ

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

)n︸︷︷︸→e−λ

(1− λ

)−k︸︷︷︸

limn→∞

)pk(1− p)n−k

= limn→∞

)k (1− λ

)n−k= λk lim

n→∞n!

k!(n− k)!

)k (1− λ

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

)n (1− λ

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

)n︸︷︷︸→e−λ

(1− λ

)−k︸︷︷︸

limn→∞

)pk(1− p)n−k

= limn→∞

)k (1− λ

)n−k= λk lim

n→∞n!

k!(n− k)!

)k (1− λ

)n−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk

(1− λ

)n (1− λ

)−k= λk

k! limn→∞

n · (n− 1) · · · (n− k + 1)nk︸︷︷︸→1

(1− λ

)n︸︷︷︸→e−λ

(1− λ

)−k︸︷︷︸

Generalized Variant Detection Framework

To detect genomic variants, we seek to

Maximize P (data | DNA sequencing assumptions)subject to Relatedness Constraints,

whereI Sequencing assumptions result in different probabilistic models

(which can be nonconvex).I Constraints result in different feasible regions (depending on

the family structure).

whereI Sequencing assumptions result in different probabilistic models

(which can be nonconvex).I Constraints result in different feasible regions (depending on

the family structure).

Maximum Likelihood Approach

P (~y| Coverage λ)

P (~y| Coverage λ) =N∏i=1

P (yi|λ)

Real SV Error

P (yi|λ)

Real SV Error

∼ Poisson(λ) P (error)

P (yi|λ)

Real SV Error

∼ Poisson(λ) P (error)

e−λλyiyi!

Using convex optimization, we can maximize P (~y|λ) for each location i!

Sparsity Penalty

Inheritance of Structural Variants (SVs)

Method Testing and Validation

I We consider both simulated and real data from 1000 GenomesProject.

I Results are compared against enforcing sparsity without relatedness.

I ROC Curves used to compare both methods.

Method Testing and Validation - ROC Curves

Image Source: Wikimedia CommonsM.Banuelos (Fresno State) mbanuelos22@csufresno.edu February 18, 2019

1000 Genomes Data

I Real data is from 1000 Genomes Project, CEUfather-mother-daughter trio sequenced at low (∼ 4X) coverage.

I Observations for possible SVs obtained from variant detectionmethod.

I Experimentally validated deletions > 250bp considered as true signal.

Source: D. M. Altshuler, E. S. Lander, L. Ambrogio, T. Bloom, K. Cibulskis, T. J. Fennell, S. B.Gabriel, D. B. Jaffe, E. Shefler, C. L. Sougnez, et al., A map of human genome variation frompopulation scale sequencing. Nature, vol. 467, no. 7319, pp. 1061–1073, 2010

1000 Genomes Data Results

Applications to Real Data - 1 Parent, 1 Child Model

0 1 2 3 4 5·104

Novel Deletions

ROC Curves for 1KG Parent Signals

Neg. BinomialThresholding

Conclusions

I There are plenty of areas you can apply your mathematics andstatistics backgrounds (this is just a sample).

I I like to work with data.I I like to work with data. I like to work with low-quality data.I If you are around next semester and this is interesting, shoot me an

email.

Conclusions

I I like to work with data.

I I like to work with data. I like to work with low-quality data.I If you are around next semester and this is interesting, shoot me an

email.

Conclusions

I I like to work with data.I I like to work with data. I like to work with low-quality data.

I If you are around next semester and this is interesting, shoot me anemail.

Conclusions

I I like to work with data.I I like to work with data. I like to work with low-quality data.I If you are around next semester and this is interesting, shoot me an

email.

machine learning methods: applications in biology and

Documents

characterizing soil biology: methods of soil analyses soil...

mathematical methods for structural biology

statistical machine learning methods for bioinformatics...

holt biology chapter 1 biology and you section 2: scientific...

editorial computational systems biology methods in...

w hat is b iology ? lab safety what is biology? biology...

cs294-1 behavioral data...

statistical machine learning and computational biology

basic methods of molecular biology

measurement methods in systems biology

optimization methods for large-scale machine...

methods cell biology 2015

algorithmic methods in conservation biology

methods in cell biology pdf[1]

statistical machine learning methods for...

machine learning methods for vehicle predictive...

starting methods of induction machine

sparse methods for machine learning

methods in molecular biology

machine learning methods