introduction to bayesian analysis in python

30
14/10/2017 Bayesian analysis in Python

Upload: peadar-coyle

Post on 22-Jan-2018

599 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Introduction to Bayesian Analysis in Python

14/10/2017

Bayesian analysis in Python

Page 2: Introduction to Bayesian Analysis in Python

Peadar Coyle – Data Scientist

Page 3: Introduction to Bayesian Analysis in Python

We will be the best place for money

Page 4: Introduction to Bayesian Analysis in Python
Page 5: Introduction to Bayesian Analysis in Python

BorrowersInvestors

Invests Repayments

Interest + capital Loans

Page 6: Introduction to Bayesian Analysis in Python

World’s 1stpeer-to-peer lending platform in 2004

£2.5 billionlent to date,and our growth is accelerating

246,000people have taken a Zopa loan

59,000actively invest through Zopa

Page 7: Introduction to Bayesian Analysis in Python
Page 8: Introduction to Bayesian Analysis in Python
Page 9: Introduction to Bayesian Analysis in Python

What is PyMC3?

• Probabilistic Programming in Python• At release stage – so ready for production • Theano based • Powerful sampling algorithms • Powerful model syntax• Recent improvements include Gaussian Processes and enhanced Variational

Inference

Page 10: Introduction to Bayesian Analysis in Python

Some stats

• Over 6000 commits

• Over 100 contributors

Page 11: Introduction to Bayesian Analysis in Python

Who uses it?

• Used widely in academia and industry

• https://github.com/pymc-devs/pymc3/wiki/PyMC3-Testimonials

• https://scholar.google.de/scholar?hl=en&as_sdt=0,5&sciodt=0,5&cites=6936955228135731011&scipsc=&authuser=1&q=&scisbd=1

Page 12: Introduction to Bayesian Analysis in Python

What is a Bayesian approach?

• The Bayesian world-view interprets probability as a measure of believability in an event, that is, how confident are we that an event will occur.

• The Frequentist approach/ view is – considers that probability is the long-run frequency of events.

• This doesn’t make much sense for say Presidential elections!

• Bayesians interpret a probability as a measure of beliefs. This allows us all to have different priors.

Page 13: Introduction to Bayesian Analysis in Python

Are Frequentist methods wrong?

• NO

• Least squares regression, LASSO regression and expectation-maximization are all powerful and fast in many areas.

• Bayesian methods complement these techniques by solving problems that these approaches can’t

• Or by illuminating the underlying system with more flexible modelling.

Page 14: Introduction to Bayesian Analysis in Python

Example – Let’s look at text message data

This data comes from Cameron Davidson-Pilon from his own text message history. He wrote the examples and the book this talk is based on. It’s cited at the end.

Page 15: Introduction to Bayesian Analysis in Python

Example – Inferring text message data

- A Poisson random variable is a very appropriate model for this type of count data.

- The math will be something like C_{i} ∼ Poisson(λ)

- We don’t know what the lambda is – what is it?

Page 16: Introduction to Bayesian Analysis in Python

Example – Inferring text message data (continued)

- It looks like the rate is higher later in the observation period.

- We’ll represent this with a ‘switchpoint’ – it’s a bit like how we write a delta function (we use a day which we call tau)

- λ={λ1 if t<τ- {λ2 if t≥τ

Page 17: Introduction to Bayesian Analysis in Python

Priors – Or beliefs

- We call alpha a hyper-parameter or parent variable. In literal terms, it is a parameter that influences other parameters.

- Alternatively, we could have two priors – one for each λi -- EXERCISE

We are interested in inferring the unknown λs. To use Bayesian inference, we need to assign prior probabilities to the different possible values of λ What would be good prior probability distributions for λ1 and λ2?

Recall that λ can be any positive number. As we saw earlier, the exponential distribution provides a continuous density function for positive numbers, so it might be a good choice for modelling λi But recall that the exponential distribution takes a parameter of its own, so we'll need to include that parameter in our model. Let's call that parameter α.

λ1∼Exp(α)λ2∼Exp(α)

Page 18: Introduction to Bayesian Analysis in Python

Priors - Continued

- We don’t care what our prior distribution (or integral) for the unknown variables looks like.

- It’s probably intractable – so needs a method to solve.- And we care about the posterior distribution

- What about τ?

- Due to the noisiness of the data, it’s difficult to pick out a priori where τmight have occurred. We’ll pick a uniform prior belief to every possible day. This is equivalent to saying

τ ∼ DiscreteUniform(1,70)

- This implies that the P(τ=k) = 1/70

Page 19: Introduction to Bayesian Analysis in Python

The philosophy of Probabilistic Programming: Our first hammer PyMC3

Another way of thinking about this: unlike a traditional program, which only runs in the forward directions, a probabilistic program is run in both the forward and backward direction. It runs forward to compute the consequences of the assumptions it contains about the world (i.e., the model space it represents), but it also runs backward from the data to constrain the possible explanations. In practice, many probabilistic programming systems will cleverly interleave these forward and backward operations to efficiently home in on the best explanations. -- Beau Cronin – sold a Probabilistic Programming focused company to Salesforce

Page 20: Introduction to Bayesian Analysis in Python

Let’s specify our variables

import pymc3 as pm import theano.tensor as ttwith pm.Model() as model:

alpha = 1.0/count_data.mean() # Recall count_data is the# variable that holds our txt counts

lambda_1 = pm.Exponential("lambda_1", alpha) lambda_2 = pm.Exponential("lambda_2", alpha)

tau = pm.DiscreteUniform("tau", lower = 0, upper=n_count_data - 1)

Page 21: Introduction to Bayesian Analysis in Python

Let’s create the switchpoint and add in observations

with model:

idx = np.arange(n_count_data) # Index

lambda_ = pm.math.switch(tau >= idx, lambda_1, lambda_2)

with model: observation = pm.Poisson("obs", lambda_, observed=count_data)

All our variables so far are random variables. We aren’t fixing any variables yet.

The variable observation combines our data (count_data), with our proposed data-generation schema, given by the variable lambda_, through the observed keyword.

Page 22: Introduction to Bayesian Analysis in Python

Let’s learn something

with model: trace = pm.sample(10000, tune=5000)

We can think of the above code as a learning step. The machinery we use is called Markov Chain Monte Carlo (MCMC), which is a whole workshop. Let’s consider it a magic trick that helps us solve these complicated formula.

This technique returns thousands of random variables from the posterior distributions of λ1,λ2 and τ.

We can plot a histogram of the random variables to see what the posterior distributions look like.

On next slide, we collect the samples (called traces in the MCMC literature) into histograms.

Page 23: Introduction to Bayesian Analysis in Python

The trace code

lambda_1_samples = trace['lambda_1’] lambda_2_samples = trace['lambda_2’] tau_samples = trace['tau']

We’ll leave out the plotting code – you can check in the notebooks.

Page 24: Introduction to Bayesian Analysis in Python

Posterior distributions of the variables

Page 25: Introduction to Bayesian Analysis in Python

Interpretation

• The Bayesian methodology returns a distribution.

• We now have distributions to describe the unknown λ1,λ2 and τ

• We can see that the plausible values for the parameters are: λ1 is around 18, and λ2 is around 23. The posterior distributions of the two lambdas are clearly distinct, indicating that it is indeed likely that there was a change in the user’s text-message behaviour.

• Our analysis also returned a distribution τ. It’s posterior distribution is discrete. We can see that near day 45, there was a 50% chance that the user’s behaviour changed. This confirms that a change occurred because had no changed occurred tau would be more spread out. We see that only a few days make any sense as potential transition points.

Page 26: Introduction to Bayesian Analysis in Python

Why would I want to sample from the posterior?

• Entire books are devoted to explaining why.

• We’ll muse the posterior samples to answer the following question: what is expected number of texts at day t, 0≤t≤70? Recall that the expected value of a Poisson variable is equal to it’s parameter λ. Therefore the question is equivalent to what is the expected value of λ at time t?

• In our code, let i index samples from the posterior distributions. Given a day t, we average over all possible λi for the day t, using λi = λ1,i if t < τi (that is, if the behaviour change has not yet occurred), else we use λi = λ2,i

Page 27: Introduction to Bayesian Analysis in Python

Analysis results

- Our analysis strongly supports believing the users’ behaviour did change. Otherwise the two lambdas would be closer in value.

- The change was sudden rather than gradual – we see this from tau’s strongly peaked posterior distribution.

- It turns out the 45th day was Christmas and the book author was moving cities.

Page 28: Introduction to Bayesian Analysis in Python

We introduced Bayesian Methods

Bayesian methods are about beliefs

PyMC3 allows building generative models

We get uncertainty estimates for free

We can add domain knowledge in priors

Page 29: Introduction to Bayesian Analysis in Python

Good book resources

Page 30: Introduction to Bayesian Analysis in Python

Online Resources

• http://docs.pymc.io/

• http://austinrochford.com/posts/2015-10-05-bayes-survival.html

• http://twiecki.github.io/

• https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers