Introduction

In statistics, Bayesian linear regression is an approach to linear regression in which the statistical analysis is undertaken within the context of Bayesian inference. When the regression model has errors that have a normal distribution, and if a particular form of prior distribution is assumed, explicit results are available for the posterior probability distributions of the model’s parameters.

Linear Regression

In simple linear regression we get point estimates by

\[y = \alpha\ + \beta\ *x\]

Equation says, there’s a linear relationship between variable $x$ and $y$. Slope is controlled by $ \beta\ $ and intercept tells about value of $y$ when $x=0$ . Methods like Ordinary Least Squares, optimize the parameters to minimize the error between observed $y$ and predicted $y$. These methods only return single best value for parameters.

Image credits: Wikipedia

Bayesian Approach

The same problem can be stated under probablistic framework. We can obtain best values of $ \alpha\ $ and $ \beta\ $ along with their uncertainity estimations. Probablistically linear regression can be explained as :

\[y \sim N(\mu=\alpha+\beta x, \sigma=\varepsilon)\]

$y$ is observed as a Gaussian distribution with mean $ \mu\ $ and standard deviation $ \sigma\ $. Unlike OLS regression, here it is normally distibuted. Since we do not know the values of $ \alpha\ $ , $ \beta\ $ and $ \epsilon\ $, we have to set prior distributions for them.

\[\begin{array}{l}{\alpha \sim N\left(\mu_{\alpha}, \sigma_{\alpha}\right)} \\ {\beta \sim N\left(\mu_{\beta}, \sigma_{\beta}\right)} \\ {\varepsilon \sim U\left(0, h_{s}\right)}\end{array}\]

Image credits: Osvaldo Martin’s book: Bayesian Analysis with Python

In this post, I’m going to demonstrate very simple linear regression problem with both OLS and bayesian approach. We will use PyMC3 package. PyMC3 is a Python package for Bayesian statistical modeling and probabilistic machine learning.

Import basic modules

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

Generate linear artificial data

rng = np.random.RandomState(1)
x = 10 * rng.rand(50)
y = 2 * x - 5 + 2*rng.randn(50)
x = x.reshape(-1, 1)
y = y.reshape(-1, 1)

plt.scatter(x, y)

Create PyMC3 model

import pymc3 as pm
print('Running on the PyMC3 v{}'.format(pm.__version__))
basic_model =  pm.Model()
Running on the PyMC3 v3.6

Define model parameters

Context is created for defining model parameters using with statement. Using context makes it easy to assign parameters to model.

Distributions for $ \alpha\ $ , $ \beta\ $ and $ \epsilon\ $ are defined. $ \mu\ $ is a deterministic variable which calculated using line equation.

Ylikelihood is a likelihood parameter which is defined ny Normal distribution with $ \mu\ $ and $ \sigma\ $. Observed values are also passed along with distribution.

with basic_model as bm:

    #Priors
    alpha = pm.Normal('alpha', mu=0, sd=10)
    beta = pm.Normal('beta', mu=0, sd=10)
    sigma = pm.HalfNormal('sigma', sd=1)

    # Deterministics
    mu = alpha + beta*x
    
    # Likelihood 
    Ylikelihood = pm.Normal('Ylikelihood', mu=mu, sd=sigma, observed=y)

Sampling from posterior

As model is defined completely, now we can sample from posterior. PyMC3 automatically chooses appropriate model depending on the type of data. In our case of continuous data, NUTS is used.

 trace = pm.sample(draws=2000,model=bm)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [sigma, beta, alpha]
100%|██████████| 2500/2500 [00:03<00:00, 770.37it/s]
100%|██████████| 2500/2500 [00:03<00:00, 721.84it/s]
The acceptance probability does not match the target. It is 0.8850329863869828, but should be close to 0.8. Try to increase the number of tuning steps.

Trace summary and plots

Traceplots plots samples histograms and values.

pm.traceplot(trace)

print(pm.summary(trace).round(2))
       mean    sd  mc_error  hpd_2.5  hpd_97.5    n_eff  Rhat
alpha -5.00  0.48      0.01    -5.93     -4.05  1619.65   1.0
beta   2.05  0.09      0.00     1.89      2.23  1583.34   1.0
sigma  1.83  0.18      0.00     1.49      2.18  2254.55   1.0

Checking autocorrelation

Bar plot of the autocorrelation function for a trace can be plotted using pymc3.plots.autocorrplot.

Autocorrelation dictates the amount of time you have to wait for convergence. If autocorrelation is high, you will have to use a longer burn-in. Low autocorrelation means good exploration.

pm.autocorrplot(trace)

Comparing parameters with Simple Linear Regression (OLS)

Parameters can be cross checked using Simple Linear Regression.


from sklearn.linear_model import LinearRegression
lm = LinearRegression()
ypred =  lm.fit(x,y).predict(x)
plt.scatter(x,y)
plt.plot(x,ypred)
legend_title = 'Simple Linear Regression\n {} + {}*x'.format(round(lm.intercept_[0],2),round(lm.coef_[0][0],2))
legend = plt.legend(loc='upper left', frameon=False, title=legend_title)
plt.title("Simple Linear Regression")
plt.show()

No handles with labels found to put in legend.

Parameters are almost similar for both pyMc3 and Simple Linear Regression

Intercept

OLS: -5.0 PyMc3: -5.0

Coefficient

OLS: 2.03 PyMc3: 2.05

Plotting traces

plt.plot(x, y, 'b.');

idx = range(0, len(trace['alpha']), 10)
alpha_m = trace['alpha'].mean()
beta_m = trace['beta'].mean()

plt.plot(x, trace['alpha'][idx] + trace['beta'][idx] *x, c='gray', alpha=0.2);
plt.plot(x, alpha_m + beta_m * x, c='k', label='y = {:.2f} + {:.2f}* x'.format(alpha_m, beta_m))
plt.xlabel('$x$', fontsize=16)
plt.ylabel('$y$', fontsize=16, rotation=0)
plt.legend(loc=2, fontsize=14)
plt.title("Traces")
plt.show()

Posterior Plots

Plot Posterior densities in style of John K. Kruschke’s book.

pm.plots.plot_posterior(trace)

Forest Plots

Generates a “forest plot” of 100*(1-alpha)% credible intervals from a trace or list of traces. It also shows R-hat - The Gelman and Rubin diagnostic which is used to check the convergence of multiple mcmc chains run in parallel

pm.plots.forestplot(trace)
GridSpec(1, 2, width_ratios=[3, 1])

Plotting energy distributions

Plot energy transition distribution and marginal energy distribution in order to diagnose poor exploration by HMC algorithms. For high-dimensional models it becomes cumbersome to look at all parameter’s traces, hence Energy plot is used to assess problems of convergence.

If you have the energy transition distribution much more narrow than energy distribution, it means you dont have enough energy to explore the whole parameter space and your posterior estimation is likely biased.

pm.plots.energyplot(trace)

Density Plots

Generates KDE plots for continuous variables. Plots are truncated at their 100*(1-alpha)% credible intervals.

pm.plots.densityplot(trace)

Sampling from Posterior

Posterior predictive checks (PPCs) are a great way to validate a model. The idea is to generate data from the model using parameters from draws from the posterior.

We will draw samples from the observed model.

ypred = pm.sampling.sample_posterior_predictive(model=bm,trace=trace, samples=500)
y_sample_posterior_predictive = np.asarray(ypred['Ylikelihood'])
100%|██████████| 500/500 [00:00<00:00, 1090.41it/s]
_, ax = plt.subplots()
ax.hist([n.mean() for n in y_sample_posterior_predictive], bins=19, alpha=0.5)
ax.axvline(y.mean())
ax.set(title='Posterior predictive of the mean', xlabel='mean(x)', ylabel='Frequency');

Plotting HPD intervals

We can plot credible intervals to see unobserved parameter values that fall with a particular subjective probability. The HPD has the nice property that any point within the interval has a higher density than any other point outside. To calculate highest posterior density (HPD) of array for given alpha, we use a function given by PyMC3 : pymc3.stats.hpd()

sig0 = pm.hpd(y_sample_posterior_predictive, alpha=0.5)
sig1 = pm.hpd(y_sample_posterior_predictive, alpha=0.05)

#removing extra dimension
sig0 = np.squeeze(sig0)
sig1 = np.squeeze(sig1)

#Reshaping and sorting
inds = x.ravel().argsort()    
x_ord = x.ravel()[inds].reshape(-1)

y= y[inds]
sig0ord= sig0[inds]
sig1ord = sig1[inds]

pal = sns.color_palette('Purples')
#plt.plot(x, y, 'b.')
plt.scatter(x_ord,y)
plt.plot(x, alpha_m + beta_m * x, c='k', label='y = {:.2f} + {:.2f}* x'.format(alpha_m, beta_m))
plt.fill_between(x_ord, sig0ord[:,0], sig0ord[:,1], color='red',alpha=1,label="50% interval")
plt.fill_between(x_ord, sig1ord[:,0], sig1ord[:,1], color='gray',alpha=0.5,label="95% interval")
plt.legend()
plt.xlabel('$x$', fontsize=16)
plt.ylabel('$y$', fontsize=16, rotation=0)
plt.title("HPD Plot")
plt.show()

Similarily using ‘posterior_predictive’ samples, we can get various percentile values to plot.

dfp = np.percentile(y_sample_posterior_predictive,[2.5,25,50,70,97.5],axis=0)
dfp = np.squeeze(dfp)
dfp = dfp[:,inds]

ymean = y_sample_posterior_predictive.mean(axis=0)
ymean =ymean[inds]

pal = sns.color_palette('Purples')
plt.rcParams['axes.facecolor'] = 'white'
plt.rcParams["axes.linewidth"]  = 1.25
plt.rcParams["axes.edgecolor"] = "0.15"


fig = plt.figure(dpi=100)
ax = fig.add_subplot(111)
plt.scatter(x_ord,y,s=10,label='observed')
ax.plot(x_ord,ymean,c=pal[5],label='posterior mean',alpha=0.5)
ax.plot(x_ord,dfp[2,:],alpha=0.75,color=pal[3],label='posterior median')
ax.fill_between(x_ord,dfp[0,:],dfp[4,:],alpha=0.5,color=pal[1],label='CR 95%')
ax.fill_between(x_ord,dfp[1,:],dfp[3,:],alpha=0.4,color=pal[2],label='CR 50%')
ax.legend()
plt.legend(frameon=True)
plt.show()

Source

You can find jupyter notebook here