![]() |
光明磊落的大象 · java的QueryWrapper多字段不能 ...· 9 月前 · |
![]() |
追风的斑马 · 如何 在Impala中插入数组 值?· 1 年前 · |
![]() |
温柔的电梯 · 绑定声明概述 - WPF .NET | ...· 1 年前 · |
![]() |
高大的桔子 · java获取文件最后的修改时间_wackyc ...· 1 年前 · |
This notebook aims to provide basic examples of how to run a variety of MCMC and nested sampling codes in Python . It won't go into detail about MCMC methods in general, and assumes a bit of knowledge about them, nor will it discuss all the various bells-and-whistles that each sampler can use, but it will hopefully help people get started with writing a sampler code using an understandable (albeit "toy") example. A very nice, quite similar, article on this topic by Jake VanderPlas can be found here . There are also very good online notes from the Computational Statistics and Statistical Computing course at Duke University by Cliburn Chan & Janice McCarthy , covering some of this material (including PyMC3 and PyStan examples that are very similar to what is covered here). Section 2 of Brewer & Foreman-Mackey (2016) gives a nice summary of the advantages and disadvantages of some of the algorithms shown here. A good general description of fitting a model to data in a Bayesian formalism, including various useful examples, can be found in Hogg, Bovy & Lang (2010) .
This code within this notebook is designed to work for Python 3. An older version that should be compatible with Python 2 can be found here .
You can jump straight to the examples for the different samplers with the links below:
An example using the bilby package, which provides a unified interface to a variety of different samplers is also provided.
MCMC methods are most often used in the context of Bayesian parameter estimation, e.g., you have a model, $y$, defined by a set of parameters, ${\vec{\theta}}$, with certain prior probability distributions $p(\vec{\theta}|I)$, and you want to efficiently sample from the marginalised posterior probability distributions on those parameters given some data, $\mathbf{d}$, and a particular likelihood function , $p(\mathbf{d}|\vec{\theta},I)$, ($I$ is just a substitute for all the implicit assumptions that have gone into defining our model). From Bayes theorem we have the joint posterior on the whole set of parameters,
p(\vec{\theta}|\mathbf{d},I) = \frac{p(\mathbf{d}|\vec{\theta},I) p(\vec{\theta}|I)}{p(\mathbf{d}|I)}, $$with marginalised posteriors on an individual parameter, say $\theta_1$, given by nested integrals over all other parameters,
p(\theta_1|\mathbf{d},I) = \frac{1}{p(\mathbf{d}|I)}\int^{\forall \theta_i \in (\vec{\theta}\neq \theta_1)} p(\mathbf{d}|\vec{\theta},I) p(\vec{\theta}|I) {\rm d}{\theta_i}, $$($\forall \theta_i \in (\vec{\theta}\neq \theta_1)$ can be translated into "for all parameters $\theta_i$ in $\vec{\theta}$ that are not equal to $\theta_1$). $p(\mathbf{d}|I)$ is the marginal likelihood , or evidence , for the data given the particular model used, and is given by
p(\mathbf{d}|I) = \int^{\forall \theta_i \in \vec{\theta}} p(\mathbf{d}|\vec{\theta},I) p(\vec{\theta}|I) {\rm d}{\theta_i}, $$which is a normalising constant for the posterior. It will be discussed more below in relation to nested sampling .
This is the only context I (and probably many people in the physical sciences) have encountered using MCMC methods, and is the context I will use below.
The very basic example that I will use to demonstrate coding up the samplers is that of sampling from the posterior of two parameters, $m$ and $c$, defining the model:
y(\mathbf{x};m,c) = m\mathbf{x} + c, $$where $\mathbf{x}$ is a vector of values of $x$. This is basically fitting the gradient and $y$-intercept of a straight line, which should be fairly familiar as linear regression , and is obviously a solved problem ( assuming uniform priors! ) using, e.g., least squares fitting . But, this provides a simple example that can hopefully be extended to more realistics scenarios.
Our aim is to produce samples drawn from the posterior $p(m, c|\mathbf{d},I)$, which can be used (by, e.g., histogramming the sample values) to approximate the marginal posteriors $p(m|\mathbf{d},I)$ and $p(c|\mathbf{d},I)$.
The vector of data samples, $\mathbf{d}$, will consist of the model defined at a set of $M$ values, $\mathbf{x}$, with additive Gaussian noise of known standard deviation, i.e., data point $i$ will be defined by
d_i = y(x_i;m,c) + n_i, $$with
n_i \sim N(0, \sigma_i), $$which means the noise is drawn from a Gaussian (or Normal) distribution of zero mean and standard deviation of $\sigma_i$.
Note : I'll generally use the term Gaussian distribution rather than "Normal" distribution, as this it what I first heard it called. But, in many packages discussed below "Normal" is the more common word for the distribution.
Setting the likelihood ¶
A sensible and fairly standard likelihood function (due to the Central Limit Theorem and maximum entropy arguments ) for a single data point given values of $m$ and $c$ and a known noise standard deviation, will be given by a Gaussian distribution of the form
p(d_i|m,c,I) = \left(2\pi\sigma_i^2\right)^{-1/2} \exp{\left(-\frac{\left[(d_i - y(x_i;m,c)\right]^2}{2\sigma_i^2} \right)}, $$and the joint likelihood for the whole dataset of $M$ points (assuming independent noise) will be the product of the individual likelihoods
p(\mathbf{d}|m,c,I) = \prod_{i=1}^M p(d_i|m,c,I) = \left(2\pi\right)^{-M/2}\left(\prod_{i=1}^M \sigma_i^{-1} \right) \exp{\left(-\sum_{i=1}^M\frac{\left[(d_i - y(x_i;m,c)\right]^2}{2\sigma_i^2} \right)}. $$When computing likelihoods, numerical considerations mean that one almost always works with the natural logarithm of the likelihood, so we have
\ln{p(\mathbf{d}|m,c,I)} \equiv \log{L} = -\frac{M}{2}\ln{(2\pi)} - \ln\left(\prod_{i=1}^M \sigma_i\right) - \sum_{i=1}^M\frac{\left[(d_i - y(x_i;m,c)\right]^2}{2\sigma_i^2}. $$For many cases we can ignore the terms that do not include our required parameters when sampling from their marginalised likelihoods (as they will just be constants), so often the log-likelihood can be set to be
\log{L} = - \sum_{i=1}^M\frac{\left[(d_i - y(x_i;m,c)\right]^2}{2\sigma_i^2}. $$Note : If wanting to evaluate the marginal likelihood for the data (e.g., using nested sampling, which is discussed later) to use it for model comparison , the constants may be required.
Setting the prior ¶
In addition to the likelihood, we need to define prior distributions on the parameters. To demonstrate a couple of different, and reasonably common, priors, I'll use a uniform probability distribution for the prior on $m$ and a Gaussian probability distribution for the prior on $c$:
p(m|\mu_m,\sigma_m,I) = \frac{1}{\sqrt{2\pi\sigma_m^2}}\exp{\left(-\frac{(m-\mu_m)^2}{2\sigma_m^2}\right)}, $$and
p(c|c_{\rm min}, c_{\rm max}) = \Bigg\{\begin{array}{cl} \frac{1}{c_{\rm max}-c_{\rm min}} & \text{if}~c_{\rm min} < c < c_{\rm max}, \\ 0 & \text{otherwise}. \end{array} $$In our example we'll set:
$\mu_m = 0$ and $\sigma_m = 10$, $c_{\rm min} = -10$ and $c_{\rm max} = 10$. Note : These priors are not intended to be the least informative for the parameterisation that we have, but are just chosen for demonstration purposes to show how to code up two different distributions. A better prior for this particular problem might be the one discussed here , which could be coded up for PyMC3 using the example here .
Creating the data ¶
So, let's get started creating our data. I'll (arbitrarily) use:
$m = 3.5$, $c = 1.2$, for the model and
$\mathbf{x} \in [0, 10)$ in $M=50$ uniform steps ($[a,b)$ means inclusive of $a$, but exclusive of $b$ ), $\sigma_i = \sigma = 2$ (i.e., each noise point is independent, but has the same standard deviation), for the data.
import numpy as np # import numpy from time import time # use for timing functions # useful modules! import os import sys # make the plots look a bit nicer with some defaults from matplotlib import pyplot as pl # import pyplot from matplotlib import matplotlib as mpl rcparams = {} rcparams [ 'axes.linewidth' ] = 0.5 rcparams [ 'font.family' ] = 'serif' rcparams [ 'font.size' ] = 22 rcparams [ 'legend.fontsize' ] = 16 rcparams [ 'mathtext.fontset' ] = "stix" # functions for plotting posteriors import corner from scipy.stats import gaussian_kde# set the true values of the model parameters for creating the data m = 3.5 # gradient of the line c = 1.2 # y-intercept of the line # set the "predictor variable"/abscissa M = 50 xmin = 0. xmax = 10. stepsize = (xmax - xmin) / M x = np.arange(xmin, xmax, stepsize) # define the model function def straight_line(x, m, c): A straight line model: y = m*x + c Args: x (list): a set of abscissa points at which the model is defined m (float): the gradient of the line c (float): the y-intercept of the line return m * x + c # seed our random number generator, so we have reproducible data np.random.seed(sum([ord(v) for v in 'samplers'])) # create the data - the model plus Gaussian noise sigma = 2.0 # standard deviation of the noise data = straight_line(x, m, c) + np.random.normal(scale=sigma, size=M) # plot the data mpl.rcParams.update(rcparams) # update plot parameters fig, ax = pl.subplots(figsize=(9,6)) ax.plot(x, data, 'bo', alpha=0.5, label='data') ax.plot(x, straight_line(x, m, c), 'r-', lw=2, label='model') ax.legend() ax.set_xlim([xmin, xmax]) ax.set_xlabel(r'$x$');
MCMC samplers¶
First up I'll deal with MCMC samplers that are purely written in Python, then a couple that are wrappers to other libraries.
emcee¶
emcee (Foreman-Mackey et al, 2013) is a Python MCMC implementation that uses an affine invariant ensemble sampler (Goodman & Weare, 2010). This basically means that it doesn't just evolve a single point in the model parameter space, but evolves and ensemble of points, and steps in the evolution are tuned based on the current state of the ensemble. The steps scale naturally (the affine invariant bit) as the ensemble closes in on the posterior. This code has been well used in the astrophysics community (and as an astrophysicist myself I'm showing it as my first example!).
emcee is available on PyPI and is installable via
pip
with:pip install emcee
conda install -c conda-forge emcee
The source code is available on GitHub here. The example here is very similar to the line model example given in the emcee documentation.
First up, we need to define the likelihood function, prior functions, and posterior probability function. These all need to be defined as the natural logarithms of the functions. All the functions take in a sample tuple or list, where a sample is a vector $\vec{\theta}$ of parameter values, that can be unpacked to give the individual parameter values. Other arguments to the posterior function can be user defined.
Posterior¶
As we do not care about the marginal likelihood, the posterior will just be the product of the likelihood and prior, and will take the following form:
def logposterior(theta, data, sigma, x): The natural logarithm of the joint posterior. Args: theta (tuple): a sample containing individual parameter values data (list): the set of data/observations sigma (float): the standard deviation of the data points x (list): the abscissa values at which the data/model is defined lp = logprior(theta) # get the prior # if the prior is not finite return a probability of zero (log probability of -inf) if not np.isfinite(lp): return -np.inf # return the likeihood times the prior (log likelihood plus the log prior) return lp + loglikelihood(theta, data, sigma, x)
def loglikelihood(theta, data, sigma, x): The natural logarithm of the joint Gaussian likelihood. Args: theta (tuple): a sample containing individual parameter values data (list): the set of data/observations sigma (float): the standard deviation of the data points x (list): the abscissa values at which the data/model is defined Note: We do not include the normalisation constants (as discussed above). # unpack the model parameters from the tuple m, c = theta # evaluate the model (assumes that the straight_line model is defined as above) md = straight_line(x, m, c) # return the log likelihood return -0.5 * np.sum(((md - data) / sigma)**2) # uniform prior on c cmin = -10. # lower range of prior cmax = 10. # upper range of prior # set prior to 1 (log prior to 0) if in the range and zero (-inf) outside the range # (we don't care about it being properly normalised, but you can if you want) lp = 0. if cmin < c < cmax else -np.inf # Gaussian prior on m mmu = 0. # mean of the Gaussian prior msigma = 10. # standard deviation of the Gaussian prior lp -= 0.5 * ((m - mmu) / msigma)**2 return lp
MCMC set up¶
We need to decide on an initial number of ensemble points and initialise the samples, i.e., set starting points for them. The initial samples can be drawn from the prior distributions (in the emcee documentation example it starts the opposite way round, i.e., with initial samples tightly packed around the true value). We'll choose 100 ensemble points (the number of so-called walkers) and initialise them with:
mmu = 0. # mean of the Gaussian prior msigma = 10. # standard deviation of the Gaussian prior mini = np.random.normal(mmu, msigma, Nens) # initial m points cmin = -10. # lower range of prior cmax = 10. # upper range of prior cini = np.random.uniform(cmin, cmax, Nens) # initial c points inisamples = np.array([mini, cini]).T # initial samples ndims = inisamples.shape[1] # number of parameters/dimensionsBecause our initial samples are drawn from the prior it can take time for them to converge on sampling from the posterior distributions that we want. To try and get around this we can run the sampler with a burn-in period (although see, e.g., here or here for some discussions about whether burn-ins are actually necessary), and we will ignore samples during this burn-in.
Note: Alternative options could be to use some optimisation routine to try and find the posterior mode and start samples at that point, or use all samples, work out an autocorrelation length and thin the samples accordingly so they are uncorrelated (as shown a bit later), which gets rid of correlations during the initial convergence of the samples.
We'll choose a burn-in of 500 samples, and then run the chain for another 500 samples, which will give an output with $(500+500)\times 100 = 100000$ samples.
import emcee # import the emcee package print('emcee version: {}'.format(emcee.__version__)) # for bookkeeping set number of likelihood calls to zero loglikelihood.ncalls = 0 # set additional args for the posterior (the data, the noise std. dev., and the abscissa) argslist = (data, sigma, x) # set up the sampler sampler = emcee.EnsembleSampler(Nens, ndims, logposterior, args=argslist)
# pass the initial samples and total number of samples required t0 = time() # start time sampler.run_mcmc(inisamples, Nsamples + Nburnin); t1 = time() timeemcee = (t1-t0) print("Time taken to run 'emcee' is {} seconds".format(timeemcee)) # extract the samples (removing the burn-in) samples_emcee = sampler.get_chain(flat=True, discard=Nburnin) def plotposts(samples, **kwargs): Function to plot posteriors using corner.py and scipy's gaussian KDE function. if "truths" not in kwargs: kwargs["truths"] = [m, c] fig = corner .corner(samples, labels=[r'$m$', r'$c$'], hist_kwargs={'density': True}, **kwargs) # plot KDE smoothed version of distributions for axidx, samps in zip([0, 3], samples.T): kde = gaussian_kde(samps) xvals = fig.axes[axidx].get_xlim() xvals = np.linspace(xvals[0], xvals[1], 100) fig.axes[axidx].plot(xvals, kde(xvals), color='firebrick') # make plot plotposts(samples_emcee) resdict = {} resdict['memcee_mu'] = np.mean(samples_emcee[:,0]) # mean of m samples resdict['memcee_sig'] = np.std(samples_emcee[:,0]) # standard deviation of m samples resdict['cemcee_mu'] = np.mean(samples_emcee[:,1]) # mean of c samples resdict['cemcee_sig'] = np.std(samples_emcee[:,1]) # standard deviation of c samples resdict['ccemcee'] = np.corrcoef(samples_emcee.T)[0,1] # correlation coefficient between parameters
The time to convergence can depend greatly on how the initial points are set up, so if you have a better guess as to location and shape of the posterior, then using those guesses is advised. But, beware overtuning! emcee has some handy utilities to draw samples from around an initial guess. In the example below I show the use of the (now deprecated)
sample_ball()
function, which creates a ball of samples assuming no correlations between parameters. However, if you knew that your parameter uncertainties had a particular covariance, you could usesample_ellipsoid()
to generate a sample cloud using a supplied covarance matrix.from emcee.utils import sample_ball m_guess, c_guess = 1., 1. # guess at the mean of the distribution m_std, c_std = 1., 1. # guess at the width of the distribution # generate samples from a ball based on those guesses inisamples = sample_ball((m_guess, c_guess), (m_std, c_std), size=Nens) # pass the initial samples and total number of samples required t0 = time() # start time sampler.run_mcmc(inisamples, Nsamples+Nburnin); t1 = time() timeemcee = (t1-t0) print("Time taken to run 'emcee' is {} seconds".format(timeemcee)) # extract the samples (removing the burn-in) samples_emcee = sampler.get_chain(flat=True, discard=Nburnin)
Once the sampler has settled, the samples that come out of MCMC will be drawn from the posterior. However, even in the best of situations, draws which are adjacent are often correlated with each other. The number of "independent" samples, that is, fully independent draws from the posterior can be much fewer than the samples in hand.
The emcee
EnsembleSampler
class has an inbuilt function,get_autocorr_time()
, to calculate the auto-correlation length (ACL) (or time) - this is an estimate the number of samples between independent samples in the chain. You can use this number to estimate convergence of chains, and also to thin out chains to produce independent samples.m_acl, c_acl = sampler.get_autocorr_time(quiet=True) print("The autocorrelation length for m is {0} and c is {1}".format(m_acl, c_acl)) # thin out the chain samples_emcee = sampler.get_chain(flat=True, discard=Nburnin, thin=int(max([m_acl, c_acl]))) print("Number of independent samples is {}".format(len(samples_emcee))) resdict['esemcee'] = len(samples_emcee) resdict['essemcee'] = int(resdict['esemcee'] / timeemcee) print("Effective samples per second: {}".format(resdict['essemcee'])) # plot the resulting posteriors plotposts(samples_emcee)
WARNING:root:The chain is shorter than 50 times the integrated autocorrelation time for 2 parameter(s). Use this estimate with caution and run a longer chain! N/50 = 20; tau: [23.98692287 25.13038681]The autocorrelation length for m is 23.986922869990487 and c is 25.130386813726226 Number of independent samples is 2000 Effective samples per second: 1059In practice, if you have a minimum number of independent samples that you require then you can repeatedly run the sampler, checking the ACL each time (and maybe adjusting the
c
input parameter forget_autocorr_time()
), until you have collected the required number.Checking the ACL is something that can be done for any of the samplers discussed in the rest of this notebook. Different sampling methods can lead to different ACLs, and methods that may take more time per sample might end up having shorter ACLs. So, examining the various trade-offs between overall speed and the final number of independent samples can be worthwhile.
PyMC3¶
PyMC3 (Salvatier, Wiecki & Fonnesbeck, 2016) is a Python MCMC implementation that can use a variety of modern sampling method, including "No-U-turn sampling" (NUTS) (Hoffman & Gellman, 2014) and Hamiltonian Monte Carlo (Duane et al, 1987), both of which make use of the gradient of the posterior to efficiently sample the space. An example of getting started using PyMC3 can be found here upon which this example is based.
This code can be installed using
pip
with:pip install pymc3
conda install -c conda-forge pymc3
and the source is available on GitHub here.
The majority of distributions that can be used for priors and likelihood functions are predefined in PyMC3, so you do not have to worry about writing them yourself (although you can create your own custom distribution).
Note that in PyMC3 the prior and likelihood distributions have to be defined within the context of a PyMC3
Model()
class, and cannot be defined outside that context (thewith
statment). Therefore, to set up the model we can't just use thestraight_line()
function defined above, but can do the following:import pymc3 as pm # import PyMC3 print('PyMC3 version: {}'.format(pm.__version__)) # set the PyMC3 model linear_model = pm.Model() with linear_model: # set prior parameters cmin = -10. # lower range of uniform distribution on c cmax = 10. # upper range of uniform distribution on c mmu = 0. # mean of Gaussian distribution on m msigma = 10. # standard deviation of Gaussian distribution on m # set priors for unknown parameters cmodel = pm.Uniform('c', lower=cmin, upper=cmax) # uniform prior on y-intercept mmodel = pm.Normal('m', mu=mmu, sd=msigma) # Gaussian prior on gradient sigmamodel = sigma # set a single standard deviation # Expected value of outcome, aka "the model" mu = mmodel*x + cmodel # Gaussian likelihood (sampling distribution) of observations, "data" Y_obs = pm.Normal('Y_obs', mu =mu, sd=sigmamodel, observed=data)
%%capture Nsamples = 1000 # final number of samples Ntune = 1000 # number of tuning samples # perform sampling t0 = time() with linear_model: trace = pm.sample(Nsamples, tune=Ntune, discard_tuned_samples=True); # perform sampling t1 = time() timepymc3 = (t1 - t0)
WARNING:pymc3:The acceptance probability does not match the target. It is 0.8881386717506918, but should be close to 0.8. Try to increase the number of tuning steps. WARNING:pymc3:The acceptance probability does not match the target. It is 0.9037542650276801, but should be close to 0.8. Try to increase the number of tuning steps.samples_pymc3 = np.vstack((trace['m'], trace['c'])).T resdict['mpymc3_mu'] = np.mean(samples_pymc3[:,0]) # mean of m samples resdict['mpymc3_sig'] = np.std(samples_pymc3[:,0]) # standard deviation of m samples resdict['cpymc3_mu'] = np.mean(samples_pymc3[:,1]) # mean of c samples resdict['cpymc3_sig'] = np.std(samples_pymc3[:,1]) # standard deviation of c samples resdict['ccpymc3'] = np.corrcoef(samples_pymc3.T)[0,1] # correlation coefficient between parameters plotposts(samples_pymc3) acts = [] for param in ["m", "c"]: acts.append(emcee.autocorr.integrated_time(trace[param])) print("Autocorrelation length for '{}' is {}".format(param, acts[-1])) print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts))) resdict['espymc3'] = int(len(samples_pymc3) / np.max(acts)) resdict['esspymc3'] = int(resdict['espymc3'] / timepymc3) print("Effective samples per second: {}".format(resdict['esspymc3']))
Autocorrelation length for 'm' is [4.57782586] Autocorrelation length for 'c' is [3.95745792] Maximum autocorrelation length is 4.6 Effective samples per second: 190TensorFlow Probability¶
TensorFlow Probability (TFP) is a probabilistic modelling framework built upon the TensorFlow library. TFP contains some samplers for performing MCMC. At the time of writing TFP is still fairly new, and currently includes a Hamiltonian MC sampler, a Metropolis-Hastings sampler, and a slice sampler.
TFP can be installed from PyPI via
pip
with:pip install tensorflow tensorflow-probability
where
tensorflow
must be explicitly installed as it's not a dependency oftensorflow-probability
.Using TFP for MCMC sampling has some similarites to PyMC3, however it can be fairly opaque, and (for me at least!) non-intuitive. The example given does not come with a detailed explanation of the constructions I've used, because I don't have any in-depth knowlegde of them! It was mainly cobbled together by looking at the example given here.
Firstly, we can set up the probablistic model (with a
JointDistributionSequential
class) that contains the prior distributions on the parameters of interest and alambda
function defining the likelihood function, within which we write out the straight line model for the data. Some things to be aware of are: the arguments to thelambda
function defining the likelihood must be given in the opposite order to that in which those arguments are placed within theJointDistributionSequential
class; thename
of theIndependent
distribution that holds the likelihood function should be the same as the name of the variable that holds the data.# import TensorFlow probability import tensorflow_probability as tfp from tensorflow_probability import distributions as tfd import tensorflow as tf cmin = -10. # lower range of uniform distribution on c cmax = 10. # upper range of uniform distribution on c mmu = 0. # mean of Gaussian distribution on m msigma = 10. # standard deviation of Gaussian distribution on m # convert x values and data to 32 bit float xtfp = x.astype(np.float32) # x is being use globally here datatfp = data.astype(np.float32) # set model - contains priors and the expected linear model model = tfd.JointDistributionSequential([ tfd.Normal(loc=mmu, scale=msigma, name="m"), # m prior tfd.Uniform(cmin, cmax, name="c"), # c prior lambda c, m: (tfd.Independent( tfd.Normal( # the Normal likelihood function loc=(m[..., tf.newaxis] * xtfp + c[..., tf.newaxis]), # the straight line model scale=sigma # the data standard deviation name="datatfp", # the name of the variable holding the data reinterpreted_batch_ndims=1, print('TensorFlow version: {}'.format(tf.__version__)) print('TensorFlow Probability version: {}'.format(tfp.__version__))
def target_log_prob_fn(mvalue, cvalue): return model.log_prob( (mvalue, cvalue, datatfp)
We can now set up the sampler and here we'll use the No U-turn Sampler (NUTS). For the set up of the
NoUTurnSampler
we have to specify a step size (i.e., the size of the proposals). Some experimentation for our particular case shows a step size of 0.05 is reasonable, but it won't work for every case and most be tuned by hand. It also means that if different parameters have quite different dynamic ranges this method will not work well (it may produce jumps that are always for too big or small for certain sets of parameters).Wrapping the sampling within a function (
do_sampling()
) with the@tf.function
decorator compiles the model and means it runs far faster than without the wrapper.We'll draw initial values of the
Nburn = 4000 # number of tuning samples # set up Hamiltonian MC (within TensorFlow function to speed up computation) @tf.function(autograph=False) def do_sampling(): # set initial state (random draw from prior) qc = tf.random.uniform([], minval=cmin, maxval=cmax, name="init_c") qm = tf.random.normal([], stddev=msigma, mean=mmu, name="init_m") hmc_kernel = tfp.mcmc.HamiltonianMonteCarlo( target_log_prob_fn=target_log_prob_fn, step_size=0.05, num_leapfrog_steps= 5, return tfp.mcmc.sample_chain( num_results=Nsamples, current_state=[qm, qc], kernel=hmc_kernel, num_burnin_steps=Nburn, t0 = time() states, kernel_results = do_sampling() t1 = time() timetfp = (t1-t0) # extract the samples ms, cs = states # convert output states to numpy arrays samples_tfp = np.vstack((ms.numpy(), cs.numpy())).T num_accepted = np.sum(kernel_results.is_accepted) print("Acceptance rate: {}".format(num_accepted / Nsamples)) # print out acceptance ratem
andc
parameters from distributions that match their priors./opt/conda/envs/python37/lib/python3.7/site-packages/tensorflow_probability/python/__init__.py:75: UserWarning: TensorFloat-32 matmul/conv are enabled for NVIDIA Ampere+ GPUs. The resulting loss of precision may hinder MCMC convergence. To turn off, run `tf.config.experimental.enable_tensor_float_32_execution(False)`. For more detail, see https://github.com/tensorflow/community/pull/287. 'TensorFloat-32 matmul/conv are enabled for NVIDIA Ampere+ GPUs. The ' /opt/conda/envs/python37/lib/python3.7/site-packages/tensorflow_probability/python/mcmc/sample.py:342: UserWarning: Tracing all kernel results by default is deprecated. Set the `trace_fn` argument to None (the future default value) or an explicit callback that traces the values you are interested in. warnings.warn('Tracing all kernel results by default is deprecated. Set 'resdict['mtfp_mu'] = np.mean(samples_tfp[:,0]) # mean of m samples resdict['mtfp_sig'] = np.std(samples_tfp[:,0]) # standard deviation of m samples resdict['ctfp_mu'] = np.mean(samples_tfp[:,1]) # mean of c samples resdict['ctfp_sig'] = np.std(samples_tfp[:,1]) # standard deviation of c samples resdict['cctfp'] = np.corrcoef(samples_tfp.T)[0,1] # correlation coefficient between parameters plotposts(samples_tfp)
# calculate the autocorrelation length acts = [] for i, param in enumerate(["m", "c"]): acts.append(emcee.autocorr.integrated_time(samples_tfp[:,i])) print("Autocorrelation length for '{}' is {}".format(param, acts[-1])) print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts))) resdict['estfp'] = int(len(samples_tfp) / np.max(acts)) resdict['esstfp'] = int(resdict['estfp'] / timetfp) print("Effective samples per second: {}".format(resdict['esstfp']))
Autocorrelation length for 'm' is [14.8794824] Autocorrelation length for 'c' is [17.3901431] Maximum autocorrelation length is 17.4 Effective samples per second: 47Zeus¶
Zeus (Karamanis & Beutler, 2020, Karamanis, Beutler & Peacock, 2021) is a pure Python MCMC code that uses ensemble slice sampling. It is very similar in usage to emcee.
Zeus is available via PyPI and can be installed using
pip
with:pip install zeus-mcmc
conda install -c conda-forge zeus-mcmc
The Zeus source code is available on Github.
The Zeus documentation gives a nice example of fitting a straight line. The interface is so similar to emcee that we can reuse the
print('Zeus version: {}'.format(zeus.__version__)) ndim = 2 # Number of parameters/dimensions (e.g. m and c) Nens = 100 # for consistency with the emcee example we'll use 100 Nburn = 500 # burn-in samples Nsamples = 500 # Nsamples # set the initial samples (we'll draw from the prior) mini = np.random.normal(mmu, msigma, Nens) # initial m points cini = np.random.uniform(cmin, cmax, Nens) # initial c points inisamples = np.array([mini, cini]).T # initial samples # Initialise the sampler sampler = zeus.EnsembleSampler(Nens, ndim, logposterior, args=[data, sigma, x]) t0 = time() # Run the sampler sampler.run_mcmc(inisamples, Nburn + Nsamples) t1 = time() timezeus = t1 - t0 # print summary diagnostics sampler.summary Number of Tuning Generations: 14 Scale Factor: 0.641642 Mean Integrated Autocorrelation Time: 3.09 Effective Sample Size: 32373.08 Number of Log Probability Evaluations: 586000 Effective Samples per Log Probability Evaluation: 0.055244logprior
,loglikelihood
andlogposterior
functions that we used in the case (I won't bother redefining them, so you'll have to scroll back up!). So, that just leaves the sampling. We will just leave any tunable parameters at their default values, but you can choose between a few different proposal distributions.samples_zeus = sampler.get_chain(flat=True, discard=Nburn) resdict['mzeus_mu'] = np.mean(samples_zeus[:,0]) # mean of m samples resdict['mzeus_sig'] = np.std(samples_zeus[:,0]) # standard deviation of m samples resdict['czeus_mu'] = np.mean(samples_zeus[:,1]) # mean of c samples resdict['czeus_sig'] = np.std(samples_zeus[:,1]) # standard deviation of c samples resdict['cczeus'] = np.corrcoef(samples_zeus.T)[0,1] # correlation coefficient between parameters plotposts(samples_zeus)
# calculate the autocorrelation length acts = [] for i, param in enumerate(["m", "c"]): acts.append(emcee.autocorr.integrated_time(samples_zeus[:, i])) print("Autocorrelation length for '{}' is {}".format(param, acts[-1])) print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts))) resdict['eszeus'] = int(len( samples_zeus) / np.max(acts)) resdict['esszeus'] = int(resdict['eszeus'] / timezeus) print("Effective samples per second: {}".format(resdict['esszeus']))
Autocorrelation length for 'm' is [3.08637982] Autocorrelation length for 'c' is [3.09159199] Maximum autocorrelation length is 3.1 Effective samples per second: 1279MCMC wrapper samplers¶
There are other MCMC samplers that are not pure Python packages, but have a Python-based wrapper to a different library. With these you can't write pure Python model and probability functions, but instead have to create a code string that is then compiled and run using the underlying library functions. We will look at PyStan and PyJAGS.
PyStan¶
PyStan is a Python interface to the popular Stan library. As with PyMC3, it uses the modern NUTS and Hamiltonain MC sampling methods. It can be installed using
pip
with:pip install pystan
or via Conda with:
conda install -c conda-forge pystan
and the source code is available on GitHub here.
To define the model and distributions you need to write a code string like this:
intN; // number of data points real y[N]; // observed data points real x[N]; // abscissa points realsigma; // standard deviation parameters {{ // parameters for the fit real m; real c; transformed parameters {{ real theta[N]; for (j in 1:N) theta[j] = m * x[j] + c; // straight line model model {{ m ~ normal({mmu}, {msigma}); // prior on m (gradient) c ~ uniform({clower}, {cupper}); // prior on c (y-intercept) y ~ normal(theta, sigma); // likelihood of the data given the model linear_data = { 'N': M, # number of data points 'y': data, # observed data (converted from numpy array to a list) 'x': x, # abscissa points (converted from numpy array to a list) 'sigma': sigma, # standard deviation import pystan # import PyStan Nsamples = 1000 # set the number of iterations of the sampler chains = 4 # set the number of chains to run with # dictionary for inputs into line_code linedict = {} linedict['mmu'] = 0.0 # mean of Gaussian prior distribution for m linedict['msigma'] = 10 # standard deviation of Gaussian prior distribution for m linedict['clower'] = -10 # lower bound on uniform prior distribution for c linedict['cupper'] = 10 # upper bound on uniform prior distribution for c t0 = time() sm = pystan.StanModel(model_code=line_code.format(**linedict)); # compile model t1 = time() timepystancomp = (t1 - t0) t0 = time() fit = sm.sampling(data=linear_data, iter=Nsamples, chains=chains); # perform sampling t1 = time() timepystan = (t1 - t0)print('PyStan version: {}'.format(pystan.__version__)) print("Time taken to comile 'PyStan' model is {} seconds".format(timepystancomp)) print("Time taken to run 'PyStan' is {} seconds".format(timepystan))
PyStan version: 2.19.1.1 Time taken to comile 'PyStan' model is 41.173564434051514 seconds Time taken to run 'PyStan' is 0.36288952827453613 secondsla = fit.extract(permuted=True) # return a dictionary of arrays samples_pystan = np.vstack((la['m'], la['c'])).T resdict['mpystan_mu'] = np.mean(samples_pystan[:,0]) # mean of m samples resdict['mpystan_sig'] = np.std(samples_pystan[:,0]) # standard deviation of m samples resdict['cpystan_mu'] = np.mean(samples_pystan[:,1]) # mean of c samples resdict['cpystan_sig'] = np.std(samples_pystan[:,1]) # standard deviation of c samples resdict['ccpystan'] = np.corrcoef(samples_pystan.T)[0,1] # correlation coefficient between parameters # plot using corner.py plotposts(samples_pystan)
# calculate the autocorrelation length acts = [] for i, param in enumerate(["m", "c"]): acts.append(emcee.autocorr.integrated_time(samples_pystan[:, i])) print("Autocorrelation length for '{}' is {}".format(param, acts[-1])) print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts))) resdict['espystan'] = int(len(samples_pystan) / np.max(acts)) resdict['esspystan'] = int(resdict['espystan'] / timetfp) print("Effective samples per second: {}".format(resdict['esspystan']))
Autocorrelation length for 'm' is [1.21500575] Autocorrelation length for 'c' is [1.11721154] Maximum autocorrelation length is 1.2 Effective samples per second: 341PyJAGS¶
PyJAGS is another wrapper to a non-Python library, in this case the JAGS library (Plummer, 2003). As with PyStan, you have to write a code string that is then compiled and run. To use PyJAGS you need to install JAGS, e.g. on a Debian-based system this can be done using:
sudo apt-get install jags
and then install PyJAGS using
pip
with:pip install pyjags
The source code for PyJAGS is available on GitHub here.
A linear regression fitting example using PyJAGS is given here, on which this example is based.
We can define the model string using (note that in this model the Gaussian/Normal distribution
y[i] ~ dnorm(c + m * x[i], {invvar}) # Gaussian likelihood m ~ dnorm({mmu}, {minvvar}) # Gaussian prior on m c ~ dunif({clower}, {cupper}) # Uniform prior on cdnorm
takes the inverse variance as the second argument, rather than the standard deviation):datadict = { 'x': x, # abscissa points (converted from numpy array to a list) 'N': M, # number of data points 'y': data, # the observed data import pyjags # import PyJAGS Nsamples = 1000 # set the number of iterations of the sampler chains = 4 # set the number of chains to run with # dictionary for inputs into line_code linedict = {} linedict['mmu'] = 0.0 # mean of Gaussian prior distribution for m linedict['minvvar'] = 1 / 10**2 # inverse variance of Gaussian prior distribution for m linedict['clower'] = -10 # lower bound on uniform prior distribution for c linedict['cupper'] = 10 # upper bound on uniform prior distribution for c linedict['invvar'] = 1 / sigma**2 # inverse variance of the data # compile model t0 = time() model = pyjags.Model(line_code_jags.format(**linedict), data=datadict, chains=chains) t1 = time() timepyjagscomp = (t1 - t0) t0 = time() samples = model.sample(Nsamples, vars=['m', 'c']) # perform sampling t1 = time() timepyjags = (t1 - t0)
Note: NumExpr detected 16 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. NumExpr defaulting to 8 threads. Using JAGS library located in /usr/lib/x86_64-linux-gnu/libjags.so.4. Loading module basemod from /usr/lib/x86_64-linux-gnu/JAGS/modules-4/basemod.so Loading module bugs from /usr/lib/x86_64-linux-gnu/JAGS/modules-4/bugs.so Loading module lecuyer from /usr/lib/x86_64-linux-gnu/JAGS/modules-4/lecuyer.soprint('PyJAGS version: {}'.format(pyjags.__version__)) print("Time taken to comile 'PyJAGS' model is {} seconds".format(timepyjagscomp)) print("Time taken to run 'PyJAGS' is {} seconds".format(timepyjags)) cchainjags = samples['c'].flatten() samples_pyjags = np.vstack((mchainjags, cchainjags)).T resdict['mpyjags_mu'] = np.mean(samples_pyjags[:,0]) # mean of m samples resdict['mpyjags_sig'] = np.std(samples_pyjags[:,0]) # standard deviation of m samples resdict['cpyjags_mu'] = np.mean(samples_pyjags[:,1]) # mean of c samples resdict['cpyjags_sig'] = np.std(samples_pyjags[:,1]) # standard deviation of c samples resdict['ccpyjags'] = np.corrcoef(samples_pyjags.T)[0,1] # correlation coefficient between parameters # plot using corner.py plotposts(samples_pyjags)
# calculate the autocorrelation length acts = [] for i, param in enumerate(["m", "c"]): acts.append(emcee.autocorr.integrated_time(samples_pystan[:, i])) print("Autocorrelation length for '{}' is {}".format(param, acts[-1])) print("Maximum autocorrelation length is {0:.1f}".format(np.max(acts))) resdict['espyjags'] = int(len(samples_pyjags) / np.max(acts)) resdict['esspyjags'] = int(resdict['espyjags'] / timetfp) print("Effective samples per second: {}".format(resdict['esspyjags']))
Autocorrelation length for 'm' is [1.21500575] Autocorrelation length for 'c' is [1.11721154] Maximum autocorrelation length is 1.2 Effective samples per second: 683Nested Sampling¶
Nested sampling (Skilling, 2006) is a method to numerically perform the integral required to evaluate the marginal likelihood of the data given a particular model. This is not a value that is produced by most standard MCMC methods, but is a value that is very useful if wanting to do Bayesian model comparison. Above, we defined the marginal likelihood, or evidence, as
p(\mathbf{d}|I) = \int^{\forall \theta_i \in \vec{\theta}} p(\mathbf{d}|\vec{\theta},I) p(\vec{\theta}|I) {\rm d}{\theta_i}, $$but, we can make this more explicit for a given hypothesis, or model, $H_j$, by stating:
p(\mathbf{d}|H_j, I) \equiv Z = \int^{\forall \theta_i \in \vec{\theta}} p(\mathbf{d}|\vec{\theta}, H_j, I) p(\vec{\theta}|H_j,I) {\rm d}{\theta_i}, $$where previously the $H_j$ had been subsumed inside our implicit assumptions, $I$.
Nested sampling algorithms generally start by drawing a set of points (sometimes called "live points", or "active points") from the prior distribution. It relies on these points being independent draws from the prior, otherwise the resulting marginal likelihood value can be biased. The likelihood is evaluated at each point, and the smallest likelihood point is found. This points is then removed from the set of live points (and goes into the sum calculating the marginal likelihood integral) and a new point is drawn from the prior with the constraint that it has a likelihood larger than the just removed point. Drawing truly independent points from this constrained prior is the main tricky part of nested sampling (particularly when the number of parameters is not small), and a variety of methods (including MCMC methods) can be used.
The uncertainty on the estimate of the natural logarithm of the marginal likelihood, $\ln{Z}$, can be found if the information gain (or Kullback-Leibler divergence), $h$, in going from the prior to the posterior is calculated. If $h$ (in nats) is available then the uncertainty is (Skilling, 2006):
\sqrt{\frac{h}{N_{\rm live}}}, $$where $N_{\rm live}$ is the number of live points used.
Note: This is the statistical uncertainty on the value, but if there are any biases due to the sampling not being independent draws from the prior then there can be systematic uncertainties too.
As well as producing a value for the marginal likelihood, a by-product of the nested sampling algorithm is the production a set of samples that can be re-sampled from to provide samples drawn from the posterior distribution.
Note: MCMC methods can be used to estimate the marginal likelihood, e.g.,
emcee
can be run with a parallel tempering sampling method and used to compute the evidence using thermodynamic integration (e.g., Goggans & Chi, 2004), or potentially using Slice sampling with PyMC3.As with MCMC, there are various codes available to perform the nested sampling algorithm. We will again look at pure Python implementations and some that are wrappers to other libraries.
Nestle¶
Nestle is a pure Python nested sampling algorithm. It has three options for the method with which it draws new samples: "classic", "single ellipsoid", and "multiple ellipsoids. In this example we'll make use of the last of these. This uses the MultiNest method (also see PyMultiNest below) (Feroz et al, 2009) in which the current live points are clustered using a recursive k-means method (however, clustering is not re-done at every iteration of the algorithm), ellipsoids completely bounding each cluster are found, an ellipsoid is randomly chosen with a probability proportional to its volume, and a new sample is drawn uniformly from it.
Nestle is available on PyPI and installable using
pip
with:pip install nestle
or via Conda with:
conda install -c conda-forge nestle
The source code is available on GitHub here.
The set up of likelihood functions is quite similar to that for emcee, but the prior that is used is a unit hypercube (an $n$-dimensional cube with all sides equal to one), so parameters must be mapped into this space. This is simple for uniform priors, but requires a re-parameterisation using a Jacobian for more complex priors (if possible).
For our Gaussian prior on $m$ we can't analytically re-parameterise it for a unit hypercube, however we can use inverse transform sampling to take samples from a uniform range between 0 and 1, and convert them back to a Gaussian, e.g.
m = \mu_m + \sqrt{2}\sigma_m {\rm erf}^{-1}\left(2c - 1\right), $$where $c \sim {\rm Uniform}(0,1)$ and ${\rm erf}^{-1}$ is the inverse error function. As shown in the Nestle documentation we can use the
scipy.special.ndtri
function for this.An example of using Nestle for the linear regression problem, but with uniform priors on both parameters, is given here.
Setting the prior transform¶
Given the above description, rather than setting a prior function, we set a prior transformation function that maps from parameters drawn from the unit hypercube into the true parameters.
def prior_transform(theta): A function defining the tranform between the parameterisation in the unit hypercube to the true parameters. Args: theta (tuple): a tuple containing the parameters. Returns: tuple: a new tuple or array with the transformed parameters. mprime, cprime = theta # unpack the parameters (in their unit hypercube form) cmin = -10. # lower bound on uniform prior on c cmax = 10. # upper bound on uniform prior on c mmu = 0. # mean of Gaussian prior on m msigma = 10. # standard deviation of Gaussian prior on m m = mmu + msigma* ndtri(mprime) # convert back to m c = cprime*(cmax-cmin) + cmin # convert back to c return (m, c)Setting the likelihood function¶
This is similar to setting the likelihood function for emcee except, as we're calculating the marginal likelihood, we will keep the normalisation constants. As before, we work with the natural logarithm of the likelihood function. The function can only take in a tuple variable containing the parameter values, so any additional required values must be hard coded, or set as global variables.
# set the natural logarithm of 2pi, so that it doesn't have to be recalculated LN2PI = np.log(2.*np.pi) LNSIGMA = np.log(sigma) # natural log of the data noise standard deviation def loglikelihood_nestle(theta): The log-likelihood function. m, c = theta # unpack the parameters # normalisation norm = -0.5 * M * LN2PI - M * LNSIGMA # chi-squared (data, sigma and x are global variables defined early on in this notebook) chisq = np.sum(((data - straight_line(x, m, c)) / sigma)**2) return norm - 0.5 * chisq
Now you can run the sampler. We'll use 1024 live/active points (there's nothing special about this number, and powers of two are not required). The more live points you use the longer the code will take to run, but the uncertainty on the final marginal likelihood value will decrease with greater numbers of live points.
Unlike with MCMCs, where you give it a number of iterations to run and after that point it stops, nested sampling needs to be supplied with a stopping criterion at which point it will terminate. One common criterion is based on the ratio between an estimate of the total evidence and the current calculated evidence value. Once this value gets below some set value the algorithm will terminate.
print('Nestle version: {}'.format(nestle.__version__)) nlive = 1024 # number of live points method = 'multi' # use MutliNest algorithm ndims = 2 # two parameters tol = 0.1 # the stopping criterion t0 = time() res = nestle.sample(loglikelihood_nestle, prior_transform, ndims, method=method, npoints=nlive, dlogz=tol) t1 = time() timenestle = (t1-t0) print("Time taken to run 'Nestle' is {} seconds".format(timenestle))logZnestle = res.logz # value of logZ infogainnestle = res.h # value of the information gain in nats logZerrnestle = np.sqrt(infogainnestle/nlive) # estimate of the statistcal uncertainty on logZ print("log(Z) = {} ± {}".format(logZnestle, logZerrnestle))
# re-scale weights to have a maximum of one nweights = res.weights/np.max(res.weights) # get the probability of keeping a sample from the weights keepidx = np.where(np.random.rand(len(nweights)) < nweights)[0] # get the posterior samples samples_nestle = res.samples[keepidx,:] resdict['mnestle_mu'] = np.mean(samples_nestle[:,0]) # mean of m samples resdict['mnestle_sig'] = np.std(samples_nestle[:,0]) # standard deviation of m samples resdict['cnestle_mu'] = np.mean(samples_nestle[:,1]) # mean of c samples resdict['cnestle_sig'] = np.std(samples_nestle[:,1]) # standard deviation of c samples resdict['ccnestle'] = np.corrcoef(samples_nestle.T)[0,1] # correlation coefficient between parameters resdict['nestle_npos'] = len(samples_nestle) # number of posterior samples resdict['nestle_time'] = timenestle # run time resdict['nestle_logZ'] = logZnestle # log marginalised likelihood resdict['nestle_logZerr'] = logZerrnestle # uncertainty on log(Z) print('Number of posterior samples is {}'.format(len(samples_nestle)))
With nested sampling the samples should be uncorrelated by construction (in reality, for some complex problems, some methods used to draw new samples within the nested sampling algorithm can leave correlations). During nested sampling, you can estimate the number of effective posterior samples from the full chain of samples using, for example, (an unnormalised verion of) the effective sampler size calculation given here.
N = len(weights) w = weights / weights.sum() ess = N / (1.0 + ((N * w - 1)**2).sum() / N) return int(ess)resdict['esnestle'] = ess(res.weights) resdict['essnestle'] = int(resdict['esnestle'] / timenestle) print("Effective samples per second: {}".format(resdict['essnestle']))
CPNest¶
CPNest (Veitch & del Pozzo, 2017) is another pure Python nested sampling algorithm. To perform the sampling from the restricted prior it uses a Metropolis-Hastings based MCMC, with sample proposal distributions that use a combination of the affine invarient ensemble methods described for emcee (Goodman & Weare, 2010), differential evolution, or jumps based on the eigenvectors of the covariance matrix of live points.
CPNest is available on PyPI and can be installed using
pip
with:pip install cpnest
or via Conda with:
conda install -c conda-forge cpnest
The source code is available on GitHub here.
To run CPNest we have to define a class containing the likelihood and prior functions. CPNest also requires explicit bounds on all parameters from which it will uniformly draw an initial set of live points. Prior to running the nested sampling algorithm, it will run an MCMC using the prior as the target distribution to make sure the samples are drawn from the required prior, so it does not require any mapping between the prior and a unit-hypercube.
class StraightLineModel(Model): A simple straight line model, with a Gaussian likelihood. Args: data (:class:`numpy.ndarray`): an array containing the observed data abscissa (:class:`numpy.ndarray`): an array containing the points at which the data were taken modelfunc (function): a function defining the model sigma (float): the standard deviation of the noise in the data names = ['m','c'] # parameter names (this is a required variables for the class) # define the boundaries using for initial sampling of the live points cmin = -10. # lower range on c (the same as the uniform c prior lower bound) cmax = 10. # upper range on c (the same as the uniform c prior upper bound) mmu = 0. # mean of the Gaussian prior on m msigma = 10. # standard deviation of the Gaussian prior on m cbounds = [cmin, cmax] # set the m bounds to be +/- 10sigma about the mean mbounds = [mmu-10.*msigma, mmu+10.*msigma] # NOTE: the bounds could instead be set through arguments passed to the __init__ # function for the class if wanted bounds=[mbounds, cbounds] # upper and lower bounds on each parameter (required for the class) def __init__(self, data, abscissa, modelfunc, sigma): # set the data self._data = data # oberserved data self._abscissa = abscissa # points at which the observed data are taken self._sigma = sigma # standard deviation(s) of the data self._logsigma = np.log(sigma) # log sigma here to save computations in the likelihood self._ndata = len(data) # number of data points self._model = modelfunc # model function def log_likelihood(self, params): The natural logarithm of the likelihood function. Args: params (dict): a dictionary keyed to the parameter names. Returns: float: the log likelihood value. # extract the parameters m = params['m'] c = params['c'] # calculate the model model = self._model(x, m, c) # normalisation norm = -0.5*self._ndata*LN2PI - self._ndata*self._logsigma # chi-squared chisq = np.sum(((self._data - model)/(self._sigma))**2) return norm - 0.5*chisq def log_prior(self,p): The natural logarithm of the prior function. Args: p (dict): a dictionary keyed to parameter names. Returns: float: The log prior value. # the uniform priors are dealt with by just checking that we're within the bounds if not self.in_bounds(p): return -np.inf # check parameters are all in bounds # Gaussian prior on m lp = 0. m = p['m'] lp -= 0.5 * ((m - self.mmu) / self.msigma)**2 # no need for normalisation constant on the prior return lpWe can now run CPNest. As with Nestle we will use 1024 live points. CPNest can use mutltiple CPU cores, but we'll set it to use one to be comparable to the other tests. The number of iterations used for the MCMC sampling is automatically worked out within the code by calculating autocorrelation lengths to see how many iterations are needed to produce independent samples, but a maximum chain length can also be set. We'll set this to 1024. As with Nestle a stopping criterion is required, which for CPNest is defined in the same way as Nestle. The current CPNest default is 0.1 and this cannot be altered by the user.
print('CPNest version: {}'.format(cpnest.__version__)) nlive = 1024 # number of live point maxmcmc = 1024 # maximum MCMC chain length nthreads = 1 # use one CPU core # set up the algorithm work = cpnest.CPNest( StraightLineModel(data, x, straight_line, sigma), verbose=0, nthreads=nthreads, nlive=nlive, maxmcmc=maxmcmc # run the algorithm t0 = time() work.run(); t1 = time() timecpnest = (t1 - t0)Setting verbosity to 0 2021-05-20, 09:32:04 - cpnest.cpnest.CPNest : Running with 1 parallel threads Running with 1 parallel threads Setting verbosity to 0 2021-05-20, 09:32:21 - cpnest.sampler.Sampler : Sampler process 970: MCMC samples accumulated = 461865 Sampler process 970: MCMC samples accumulated = 461865 2021-05-20, 09:32:21 - cpnest.sampler.Sampler : Sampler process 970 - mean acceptance 0.496: exiting Sampler process 970 - mean acceptance 0.496: exiting 2021-05-20, 09:32:21 - cpnest.NestedSampling.NestedSampler : Final evidence: -116.80 Final evidence: -116.80 2021-05-20, 09:32:21 - cpnest.NestedSampling.NestedSampler : Information: 7.16 Information: 7.16 2021-05-20, 09:32:22 - cpnest.nest2pos : Computed log_evidences: (-116.80638721422837,) Computed log_evidences: (-116.80638721422837,) 2021-05-20, 09:32:22 - cpnest.nest2pos : Relative weights of input files: [1.0] Relative weights of input files: [1.0] 2021-05-20, 09:32:22 - cpnest.nest2pos : Relative weights of input files taking into account their length: [1.0] Relative weights of input files taking into account their length: [1.0] 2021-05-20, 09:32:22 - cpnest.nest2pos : Number of input samples: [11764] Number of input samples: [11764] 2021-05-20, 09:32:22 - cpnest.nest2pos : Expected number of samples from each input file [2754] Expected number of samples from each input file [2754] 2021-05-20, 09:32:22 - cpnest.nest2pos : Samples produced: 2754 Samples produced: 2754We can again extract the value of $\ln{Z}$ and its uncertainty. One simple bit of model comparison that we may want to do is to compare the evidence for the model of the data containg a straight line (with the given priors on the parameter), with the evidence for a model that the data just contains independent Gaussian noise with the given standard deviation. We can calculate the latter evidence easily by just setting the model to zero in the log-likelihood function, e.g.:
\ln{Z}_{\rm noise} = -\frac{M}{2}\ln{(2\pi)} - \ln\left(\prod_{i=1}^M \sigma_i\right) - \sum_{i=1}^M\frac{d_i^2}{2\sigma_i^2}. $$The natural logarithm of the Bayes factor comparing the two models is then:
\ln{\mathcal{B}} = \ln{Z} - \ln{Z}_{\rm noise}. $$We will also extract this value.
logZcpnest = work.NS.logZ # value of log Z infogaincpnest = work.NS.state.info # value of the information gain logZerrcpnest = np.sqrt(infogaincpnest/nlive) # estimate of the statistcal uncertainty on logZ # get the null log likelihood (evidence that the data is Gaussian noise with zero mean, and given standard devaition) logZnull = work.user.log_likelihood({'m': 0., 'c': 0.}) print("log(Z) = {} ± {}".format(logZcpnest, logZerrcpnest)) # output the log Bayes factor print('log Bayes factor is {}'.format(logZcpnest - logZnull))
mchain_cpnest = work.posterior_samples['m'] # extract chain of m values cchain_cpnest = work.posterior_samples['c'] # extract chain if c values samples_cpnest = np.vstack((mchain_cpnest, cchain_cpnest)).T # print out the number of posterior samples print('Number of posterior samples is {}'.format(samples_cpnest.shape[0])) resdict['mcpnest_mu'] = np.mean(samples_cpnest[:,0]) # mean of m samples resdict['mcpnest_sig'] = np.std(samples_cpnest[:,0]) # standard deviation of m samples resdict['ccpnest_mu'] = np.mean(samples_cpnest[:,1]) # mean of c samples resdict['ccpnest_sig'] = np.std(samples_cpnest[:,1]) # standard deviation of c samples resdict['cccpnest'] = np.corrcoef(samples_cpnest.T)[0,1] # correlation coefficient between parameters resdict['cpnest_npos'] = len (samples_cpnest) # number of posterior samples resdict['cpnest_time'] = timecpnest # run time resdict['cpnest_logZ'] = logZcpnest # log marginalised likelihood resdict['cpnest_logZerr'] = logZerrcpnest # uncertainty on log(Z) # plot using corner.py plotposts(samples_cpnest)
from cpnest.nest2pos import compute_weights _, weights = compute_weights(work.get_nested_samples()["logL"], work.nlive) weights = np.exp(weights) resdict['escpnest'] = ess(weights) resdict['esscpnest'] = int(resdict['escpnest'] / timecpnest) print("Effective samples per second: {}".format(resdict['esscpnest']))
dynesty¶
dynesty (Speagle, 2019) is a pure Python-based implementation of the dynamic nested sampling algorithm (Higson et al, 2017). It is very similar in usage to Nestle. Dynamic nested sampling is a way of dynamically adapting the number of live points used to provide more accurate evidence estimates and/or larger numbers of independent posterior samples. The code can run using dynamic adjustment of live points or just in a "static" mode with a fixed number of live points. There a many options for how the new live points are drawn, both in terms of the bounds from which they are drawn and the sampling from within those bounds. The bound options include the using bounding ellipsoids in the style of MultiNest, or the RadFriends algorithm of UltraNest, among others. The sampling options include uniform draws from the bounds, random walks, or slice sampling in the style of PolyChord. If not explicitly specified, dynesty will select a sampling algorithm based on the number of dimensions of the problem being tackled.
dynesty is available on PyPI and can be installed with
pip
using:pip install dynesty
or using Conda with:
conda install -c conda-forge dynesty
Note: Other dynamic nested sampling codes exist such as dyPolyChord and
perfectns
.The likelihood function and prior transform can be set up in a way that is identical to Nestle (we'll define both again below for completeness), with the differences being in how the sampler is called. Here we will show the sampling being used for both "static" and "dynamic" nested sampling.
def prior_transform(theta): A function defining the tranform between the parameterisation in the unit hypercube to the true parameters. Args: theta (tuple): a tuple containing the parameters. Returns: tuple: a new tuple or array with the transformed parameters. mprime, cprime = theta # unpack the parameters (in their unit hypercube form) cmin = -10. # lower bound on uniform prior on c cmax = 10. # upper bound on uniform prior on c mmu = 0. # mean of Gaussian prior on m msigma = 10. # standard deviation of Gaussian prior on m m = mmu + msigma*ndtri(mprime) # convert back to m c = cprime*(cmax-cmin) + cmin # convert back to c return (m, c) # set the natural logarithm of 2pi, so that it doesn't have to be recalculated LN2PI = np.log(2. * np.pi) LNSIGMA = np.log(sigma) # natural log of the data noise standard deviation def loglikelihood_dynesty(theta): The log-likelihood function. m, c = theta # unpack the parameters # normalisation norm = -0.5 * M * LN2PI - M * LNSIGMA # chi-squared (data, sigma and x are global variables defined early on in this notebook) chisq = np.sum(((data-straight_line(x, m, c))/sigma)**2) return norm - 0.5 * chisq print('dynesty version: {}'.format(dynesty.__version__)) nlive = 1024 # number of live points bound = 'multi' # use MutliNest algorithm for bounds ndims = 2 # two parameters sample = 'unif' # uniform sampling tol = 0.1 # the stopping criterion from dynesty import NestedSampler sampler = NestedSampler(loglikelihood_dynesty, prior_transform, ndims, bound=bound, sample=sample, nlive=nlive) t0 = time() sampler.run_nested(dlogz=tol, print_progress=False) # don't output progress bar t1 = time() timedynesty = (t1-t0) print("Time taken to run 'dynesty' (in static mode) is {} seconds".format(timedynesty))res = sampler.results # get results dictionary from sampler logZdynesty = res.logz[-1] # value of logZ logZerrdynesty = res.logzerr[-1] # estimate of the statistcal uncertainty on logZ print("log(Z) = {} ± {}".format(logZdynesty, logZerrdynesty))
# get function that resamples from the nested samples to give sampler with equal weight from dynesty.utils import resample_equal # draw posterior samples weights = np.exp(res['logwt'] - res['logz'][-1]) samples_dynesty = resample_equal(res.samples, weights) resdict['mdynesty_mu'] = np.mean(samples_dynesty[:,0]) # mean of m samples resdict['mdynesty_sig'] = np.std(samples_dynesty[:,0]) # standard deviation of m samples resdict['cdynesty_mu'] = np.mean(samples_dynesty[:,1]) # mean of c samples resdict['cdynesty_sig'] = np.std(samples_dynesty[:,1]) # standard deviation of c samples resdict['ccdynesty'] = np.corrcoef(samples_dynesty.T)[0,1] # correlation coefficient between parameters resdict['dynesty_npos'] = len(samples_dynesty) # number of posterior samples resdict['dynesty_time'] = timedynesty # run time resdict['dynesty_logZ'] = logZdynesty # log marginalised likelihood resdict['dynesty_logZerr'] = logZerrdynesty # uncertainty on log(Z) print('Number of posterior samples is {}'.format(len(samples_dynesty))) # plot using corner.py plotposts(samples_dynesty)
resdict ['esdynesty'] = ess(np.exp(res["logwt"] - logZdynesty)) resdict['essdynesty'] = int(resdict['esdynesty'] / timedynesty) print("Effective samples per second: {}".format(resdict['essdynesty']))
We can now do the same thing, but with the "dynamic" sampler. For this we can specify an initial number of live points (we will stick with 1024), but the actual number used will evolve. The stopping criterion for the dynamic method is more complex than for the static case, and we will just leave it at the default values.
# set up the sampler dsampler = DynamicNestedSampler(loglikelihood_dynesty, prior_transform, ndims, bound=bound, sample=sample) # run the sampler (not that we have not included the dlogz tolerance) t0 = time() dsampler.run_nested(nlive_init=nlive, print_progress=False) t1 = time() timedynestydynamic = (t1-t0) print("Time taken to run 'dynesty' (in dynamic mode) is {} seconds".format(timedynestydynamic))dres = dsampler.results # get results dictionary from sampler logZdynestydynamic = dres.logz[-1] # value of logZ logZerrdynestydynamic = dres.logzerr[-1] # estimate of the statistcal uncertainty on logZ print("Static: log(Z) = {} ± {}".format(logZdynesty, logZerrdynesty)) print("Dynamic: log(Z) = {} ± {}".format(logZdynestydynamic, logZerrdynestydynamic)) # draw posterior samples dweights = np.exp(dres['logwt'] - dres['logz'][-1]) samples_dynestydynamic = resample_equal(dres.samples, dweights) essdynestydynamic = ess(np.exp(dres["logwt"] - logZdynestydynamic)) print('Static: num. posterior samples: {}'.format(len(samples_dynesty))) print('Dynamic: num. posterior samples: {}'.format(len(samples_dynestydynamic))) print('Static: effective sample size: {}'.format(resdict['esdynesty'])) print('Dynamic: effective sample size: {}'.format(essdynestydynamic))
Static: log(Z) = -116.66207844899319 ± 0.11680906514881793 Dynamic: log(Z) = -116.70461530428305 ± 0.11701110275760397 Static: num. posterior samples: 11624 Dynamic: num. posterior samples: 25519 Static: effective sample size: 4074 Dynamic: effective sample size: 16617UltraNest¶
UltraNest is Python-based implementation of the nested sampling, that contains several sampling algorithms designed for correctness of sampling. One particular method it provides is called RadFriends (Buchner, 2014) that is designed to be robust against effects caused by inaccuracies of drawing new samples from the constrained prior.
UltraNest is available from its github repository, but can be installed form PyPi using
pip install ultranest
or using Conda with:
conda install -c conda-forge ultranest
Similarly to Nestle and dynesty, UltraNest samples from a unit hypercube, and therefore you have to defined a prior transformation function, so we can set up these functions in a very similar way to what we have seen above.
def prior_transform_ultranest(theta): A function defining the tranform between the parameterisation in the unit hypercube to the true parameters. Args: theta (list): a list/array containing the parameters. Returns: list: a new list/array with the transformed parameters. # unpack the parameters (in their unit hypercube form) mprime = theta[0] cprime = theta[1] cmin = -10. # lower bound on uniform prior on c cmax = 10. # upper bound on uniform prior on c mmu = 0. # mean of Gaussian prior on m msigma = 10. # standard deviation of Gaussian prior on m m = mmu + msigma*ndtri(mprime) # convert back to m c = cprime*(cmax-cmin) + cmin # convert back to c return np.array([m, c]) norm = -0.5*M*LN2PI - M*LNSIGMA # chi-squared (data, sigma and x are global variables defined early on in this notebook) chisq = np.sum(((data-straight_line(x, m, c))/sigma)**2) return norm - 0.5*chisq print('UltraNest version: {}'.format(ultranest.__version__)) # set the ReactiveNestedSampler method sampler = ultranest.ReactiveNestedSampler(["m", "c"], loglikelihood_ultranest, prior_transform_ultranest) ndims = 2 # two parameters tol = 0.1 # the stopping criterion # run the nested sampling algorithm t0 = time() result = sampler.run(dlogz=tol) t1 = time() timeultranest = (t1 - t0)Sampling 400 live points from prior ... run_iter dlogz=0.1, dKL=0.5, frac_remain=0.01, Lepsilon=0.0010, min_ess=400 max_iters=-1, max_ncalls=-1, max_num_improvement_loops=-1, min_num_live_points=400, cluster_num_live_points=40 minimal_widths_sequence: [(-inf, 400.0), (inf, 400.0)]logZultranest = result['logz'] # value of logZ logZerrultranest = result['logzerr'] # estimate of the statistcal uncertainty on logZ # output marginal likelihood print("Marginalised evidence is {} ± {}".format(logZultranest, logZerrultranest))
points = np.array(result["weighted_samples"]["points"]) weights = np.array(result["weighted_samples"]["weights"]) scaledweights = weights / weights.max() mask = np.random.rand(len(scaledweights)) < scaledweights samples_ultranest = points[mask, :] resdict['multranest_mu'] = np.mean(samples_ultranest[:,0]) # mean of m samples resdict['multranest_sig'] = np.std(samples_ultranest[:,0]) # standard deviation of m samples resdict['cultranest_mu'] = np.mean(samples_ultranest[:,1]) # mean of c samples resdict['cultranest_sig'] = np.std(samples_ultranest[:,1]) # standard deviation of c samples resdict['ccultranest'] = np.corrcoef(samples_ultranest.T)[0,1] # correlation coefficient between parameters resdict['ultranest_npos'] = len(samples_ultranest) # number of posterior samples resdict['ultranest_time'] = timeultranest # run time resdict['ultranest_logZ'] = logZultranest # log marginalised likelihood resdict['ultranest_logZerr'] = logZerrultranest # uncertainty on log(Z) # plot using corner.py plotposts(samples_ultranest) print('Number of posterior samples is {}'.format(samples_ultranest.shape[0]))
resdict['esultranest'] = ess(weights) resdict['essultranest'] = int(resdict['esultranest'] / timeultranest) print("Effective samples per second: {}".format(resdict['essultranest']))
We will now look at some nested sampling algorithms that are not pure Python, but do have Python wrappers.
PyMultiNest¶
PyMultiNest (Buchner et al, 2014) is a wrapper to the popular MultiNest code (Feroz & Hobson, 2008 & Feroz et al, 2009), which has been particularly taken up by the astrophysics and cosmology community. The basics underlying MultiNest were described in relation to Nestle, although MultiNest can use more sophisticated sampling methods, such as importance sampling (Feroz et al, 2014).
To use PyMultiNest, the MultiNest library itself must be installed (MultiNest can be downloaded from GitHub here). Installation of PyMultiNest and its requirements are described here; once MultiNest is installed you can then install PyMultiNest using
pip
with:pip install pymultinest
A simpler solution, that comes bundled with MultiNest, is to install using Conda with:
conda install -c conda-forge pymultinest
Note: If using MultiNest, and software wrapping it, for your work please note the software license restrictions here.
Some examples of running PyMultiNest can be found here, although an interface through a
solver
function andSolver
class are also available. We'll use theSolver
class method, which can take in the same initialisation keyword arguments as therun()
method.We use the
Solver
class to create a new class for our specific example. As with CPNest we use this class to define our log likeihood function, and like Nestle we define a prior transform to go from a unit hypercube to the true parameterisation. These functions can only take in arrays of the parameter values that we are exploring and cannot take in additional arguments. Therefore any additional values required must either be global variables, or a new__init__
method for the class must be defined. We'll show the latter of these.Note: PyMultiNest is also available in Python through the Monte Python package.
import pymultinest from pymultinest.solve import Solver from scipy.special import ndtri LN2PI = np.log(2.*np.pi) class StraightLineModelPyMultiNest(Solver): A simple straight line model, with a Gaussian likelihood. Args: data (:class:`numpy.ndarray`): an array containing the observed data abscissa (:class:`numpy.ndarray`): an array containing the points at which the data were taken modelfunc (function): a function defining the model sigma (float): the standard deviation of the noise in the data **kwargs: keyword arguments for the run method # define the prior parameters cmin = -10. # lower range on c (the same as the uniform c prior lower bound) cmax = 10. # upper range on c (the same as the uniform c prior upper bound) mmu = 0. # mean of the Gaussian prior on m msigma = 10. # standard deviation of the Gaussian prior on m def __init__(self, data, abscissa, modelfunc, sigma, **kwargs): # set the data self._data = data # oberserved data self._abscissa = abscissa # points at which the observed data are taken self._sigma = sigma # standard deviation(s) of the data self._logsigma = np.log(sigma) # log sigma here to save computations in the likelihood self._ndata = len(data) # number of data points self._model = modelfunc # model function Solver.__init__(self, **kwargs) def Prior(self, cube): The prior transform going from the unit hypercube to the true parameters. This function has to be called "Prior". Args: cube (:class:`numpy.ndarray`): an array of values drawn from the unit hypercube Returns: :class:`numpy.ndarray`: an array of the transformed parameters # extract values mprime = cube[0] cprime = cube[1] m = self.mmu + self.msigma*ndtri(mprime) # convert back to m c = cprime*(self.cmax-self.cmin) + self.cmin # convert back to c return np.array([m, c]) def LogLikelihood(self, cube): The log likelihood function. This function has to be called "LogLikelihood". Args: cube (:class:`numpy.ndarray`): an array of parameter values. Returns: float: the log likelihood value. # extract parameters m = cube[0] c = cube[1] # calculate the model model = self._model(x, m, c) # normalisation norm = -0.5*self._ndata*LN2PI - self._ndata*self._logsigma # chi-squared chisq = np.sum(((self._data - model)/(self._sigma))**2) return norm - 0.5*chisqnlive = 1024 # number of live points ndim = 2 # number of parameters tol = 0.1 # stopping criterion # run the algorithm t0 = time() solution = StraightLineModelPyMultiNest(data, x, straight_line, sigma, n_dims=ndim, n_live_points=nlive, evidence_tolerance=tol); t1 = time() timepymultinest = (t1-t0)
logZpymnest = solution.logZ # value of log Z logZerrpymnest = solution.logZerr # estimate of the statistcal uncertainty on logZ print("log(Z) = {} ± {}".format(logZpymnest, logZerrpymnest))
Model in "(temporary directory)" (2 dimensions) Evidence ln Z = -116.7 +- 0.1 Parameter values: Parameter 1 : 3.464 +- 0.100 Parameter 2 : 0.992 +- 0.566mchain_pymnest = solution.samples[:,0 ] # extract chain of m values cchain_pymnest = solution.samples[:,1] # extract chain if c values samples_pymnest = np.vstack((mchain_pymnest, cchain_pymnest)).T resdict['mpymnest_mu'] = np.mean(samples_pymnest[:,0]) # mean of m samples resdict['mpymnest_sig'] = np.std(samples_pymnest[:,0]) # standard deviation of m samples resdict['cpymnest_mu'] = np.mean(samples_pymnest[:,1]) # mean of c samples resdict['cpymnest_sig'] = np.std(samples_pymnest[:,1]) # standard deviation of c samples resdict['ccpymnest'] = np.corrcoef(samples_pymnest.T)[0,1] # correlation coefficient between parameters resdict['pymnest_npos'] = len(samples_pymnest) # number of posterior samples resdict['pymnest_time'] = timepymultinest # run time resdict['pymnest_logZ'] = logZpymnest # log marginalised likelihood resdict['pymnest_logZerr'] = logZerrpymnest # uncertainty on log(Z) # print out the number of posterior samples print('Number of posterior samples is {}'.format(samples_pymnest.shape[0])) # plot using corner.py plotposts(samples_pymnest)
# note: I'm not calculating the correct ESS here as I can't access the weights resdict['espymnest'] = len(samples_pymnest) resdict['esspymnest'] = int(resdict['espymnest'] / timepymultinest) print("Effective samples per second: {}".format(resdict['esspymnest']))
DNest4¶
DNest4 (Brewer & Foreman-Mackey, 2016) uses a nested sampling algorithm called Diffusive Nested Sampling (Brewer et al, 2011). The main difference between this method and others is that it can at any point still propose samples throughout the whole prior, rather than being constrained to draw points within the current minimum likelihood bound. This allows it to find volumes of high likelihood that might be missed by other methods because they lie in an area where no live points remain, i.e. it is good for sampling narrow multi-model posteriors. It can also be used in cases where the number of parameters in the model that is being fit is itself a variable, e.g., if the line model could either be a straight line, or a quadratic, or a cubic. The method used is called a Reversible Jump MCMC (Green, 1995), although this is not currently available in the Python interface for DNest4.
An example of using the Python wrapper for DNest4 for a straight line mode is given here, but we will reproduce it with minor modifications (in that example $\sigma$ is also included in the fitting).
You can install the development version of DNest4 by cloning it from its GitHub repository and following the instructions here. However, the Python interace can be installed from PyPI via pip with
pip install dnest4
or through Conda with
conda install -c conda-forge dnest4
For DNest4 we need to create a
Model
class that contains the following methods:from_prior()
: a function that draws a sample from the prior distributions of the parametersperturb()
: a function that proposes new sample valueslog_likelihood()
: the natural logarithm of the likelihood functionThe
H = \frac{p(\theta'|I)}{p(\theta|I)} \frac{g(\theta|\theta')}{g(\theta'|\theta)}, $$perturb()
method also needs to calculate the natural logarithm of the Metropolis-Hastings ratio required to give the acceptance probability:where the first ratio on the right-hand-side is the ratio of priors at the proposed point $\theta'$ and current point $\theta$, and the second ratio of the ratio of the proposal distributions. In our case the proposal distribution is symmetric, so its ratio is unity, and the ratio of the prior depends on the Gaussian prior on $m$, but not on the uniform prior for $c$.
Parameters cannot be passed to the class on initialisation, so things such as the data must be global variables.
# import randh function, which draws points from a heavy-tailed distribution, and wrapm which wraps a point within a given range from dnest4 import randh, wrap LN2PI = np.log(2.*np.pi) LNSIGMA = np.log(sigma) # define prior range values as global variables cmin = -10. # lower range on c (the same as the uniform c prior lower bound) cmax = 10. # upper range on c (the same as the uniform c prior upper bound) mmu = 0. # mean of the Gaussian prior on m msigma = 10. # standard deviation of the Gaussian prior on m class DNest4Model(object): Specify the model in Python. def __init__(self): Parameter values *are not* stored inside the class def from_prior(self): Draw points from the prior distribution. Returns: :class:`np.ndarray`: a numpy array containing the parameter values. m = np.random.normal(mmu, msigma) c = np.random.uniform(cmin, cmax) return np.array([m, c]) def perturb(self, params): Perturb the current parameters by proposing a new position. This takes a numpy array of parameters as input, and modifies it in-place. In just perturbs one parameter at a time. Args: params (:class:`np.ndarray`): the current parameters (this is modified by the function) Returns: float: the natural logarithm of the Metropolis-Hastings proposal ratio. logH = 0.0 # log of the Metropolis-Hastings prior x proposal ratio # randomly choose which parameter to perturb which = np.random.randint(len(params)) mmu = 0. msigma = 10. if which == 0: # update H for Gaussian prior logH -= -0.5*((params[which]-mmu)/msigma)**2 params[which] += 1.*randh() # we could scale this or use different distributions for each parameter, but in our case we know our distribution is quite compact and unimodal if which == 0: # update H for Gaussian prior logH += -0.5*((params[which]-mmu)/msigma)**2 else: # wrap c value so that it stays within the prior range params[which] = wrap(params[which], cmin, cmax) return logH def log_likelihood(self, params): Gaussian sampling distribution. m, c = params # unpack parameters norm = -0.5*M*LN2PI - M*LNSIGMA chisq = np.sum(((data - straight_line(x, m, c))/sigma)**2) return norm - 0.5*chisq
We can now run the sampler. The values that are passed to the sampler are described in Section 7 of Brewer & Foreman-Mackey, 2016, but we'll stick with the defaults used in the example code. Unlike with the other nested sampling algorithms we do not pass a tolerance for the stopping criterion, but just specify the number of levels and steps to perform. The
print('DNest4 version: {}'.format(dnest4.__version__)) # Create a model object and a sampler model = DNest4Model() sampler = dnest4.DNest4Sampler(model, backend=dnest4.backends.CSVBackend(".", sep=" ")) # Set up the sampler. The first argument is max_num_levels gen = sampler.sample(max_num_levels=30, num_steps=1000, new_level_interval=10000, num_per_step=10000, thread_steps=100, num_particles=5, lam=10, beta=100) # Do the sampling (one iteration here = one particle save) t0 = time() for sample in enumerate(gen): t1 = time() timednest4 = (t1-t0) print("Time taken to run 'DNest4' is {} seconds".format(timednest4))num_steps
gives to total number of MCMC iterations, but if not set the code will run forever by default, until the process is killed by the user.To create the posterior samples the output of DNest4 has to be processed, which can be performed using the
postprocess()
function. Once this is done the posterior samples will be held in aposterior_sample.txt
file containing a column for each parameter. As stated in Brewer & Foreman-Mackey, 2016:"it is harder to compute justified error bars on $\ln{(Z)}$ in DNS than it is in standard NS."
so we will not quote a value here.
# Run the postprocessing to get marginal likelihood and generate posterior samples logZdnest4, infogaindnest4, _ = dnest4.postprocess(plot=False); samples_dnest4 = np.loadtxt('posterior_sample.txt')
# print out the number of posterior samples print('Number of posterior samples is {}'.format(samples_dnest4.shape[0])) resdict['mdnest4_mu'] = np.mean(samples_dnest4[:,0]) # mean of m samples resdict['mdnest4_sig'] = np.std(samples_dnest4[:,0]) # standard deviation of m samples resdict['cdnest4_mu'] = np.mean(samples_dnest4[:,1]) # mean of c samples resdict['cdnest4_sig'] = np.std(samples_dnest4[:,1]) # standard deviation of c samples resdict['ccdnest4'] = np.corrcoef(samples_dnest4.T)[0,1] # correlation coefficient between parameters resdict['dnest4_npos'] = len(samples_dnest4) # number of posterior samples resdict['dnest4_time'] = timednest4 # run time resdict['dnest4_logZ'] = logZdnest4 # log marginalised likelihood # plot using corner.py plotposts(samples_dnest4)
# note: I'm not calculating the correct ESS here as I can't access the weights resdict['esdnest4'] = len(samples_dnest4) resdict['essdnest4'] = len(samples_dnest4) / timednest4 print("Effective samples per second: {0:.2f}".format(resdict['essdnest4']))
PyPolyChord¶
PolyChord (Handley et al, 2015a, Handley et al, 2015b) is a nested sampling implementation that uses slice sampling when drawing new points from the constrained prior.
The PolyChord package itself provides a Python interface called PyPolyChord for using the algorithm. PolyChord can be downloaded from Github here with:
git clone https://github.com/PolyChord/PolyChordLite.git
PyPolyChord can then be installed with
cd PolyChordLite python setup.py install
The installation will attempt to find a copy of the
libmpi.so
library. If you have not got MPI (or are having further problems with MPI) you can disable it by instead installing with:python setup.py --no-mpi install
As with Nestle and PyMultiNest PolyChord samples from a unit hypercube, so you need to define a prior transform function. We also need a log likelihood function. These can be very similar to those used for Nestle, but the functions must take in and output list rather than lists or tuples. Values such as the data array cannot be passed to the functions explicitly, so must be set as global variables.
Note: If using PolyChord for your work, please note the license restrictions described here.
def prior_transform_polychord(cube): A function defining the tranform between the parameterisation in the unit hypercube to the true parameters. Args: cube (array, list): a list containing the parameters as drawn from a unit hypercube. Returns: list: the transformed parameters. # unpack the parameters (in their unit hypercube form) mprime = cube[0] cprime = cube[1] cmin = -10. # lower bound on uniform prior on c cmax = 10. # upper bound on uniform prior on c mmu = 0. # mean of Gaussian prior on m msigma = 10. # standard deviation of Gaussian prior on m m = mmu + msigma * ndtri(mprime) # convert back to m c = cprime * (cmax - cmin) + cmin # convert back to c theta = [m, c] return theta def loglikelihood_polychord(theta): The log-likelihood function. Args: theta (array, list): the set of parameter values. Returns: float: the log-likelihood value. list: A list of any derived parameters (an empty list if there are none). # unpack the parameters m = theta[0] c = theta[1] # normalisation norm = -0.5 * M * LN2PI - M*LNSIGMA # chi-squared (data, sigma and x are global variables defined early on in this notebook) chisq = np.sum(((data-straight_line(x, m, c)) / sigma)**2) return norm - 0.5 * chisq, []As before we'll use 1024 live points (the default is to use 25 times the dimensionality of the problem) and a stopping criterion of 0.1 (the default value is 0.001). The current version of PyPolyChord will return values of the marginal likelihood and its uncertainty from the
PyPolyChord
object that runs the algorithm, but the class does not currently contain the posterior samples. It instead outputs these to files. The location of these files is set by setting the keyword argumentsbase_dir
(defaults to "chains
") andfile_root
(defaults to "test
") in aPolyChordSettings
object. This sets{root} = {base_dir}/{file_root}
, and within this the posterior samples can be found in{root}.txt
and the other statistics found in{root}.stats
.You may find that codes produce a "Segmentation Fault" soon after they start running. The PolyChord documentation says:
"The slice sampling & clustering steps use a recursive procedure. The default memory allocated to recursive procedures is embarassingly small (to guard against memory leaks)."
and the solution is to:
Try increasing the stack size:
Linux:ulimit -s unlimited
OSX:ulimit -s hard
Alternatively, within a Python script you can achieve this using the
resource
module, as shown below.nlive = 1024 # number of live points ndims = 2 # number of parameters nderived = 0 # number of derived parameters (this is zero) tol = 0.1 # stopping criterion basedir = os.path.join(os.getcwd(), 'polychord') # output base directory if not os.path.isdir(basedir): os.makedirs(basedir) # create base directory os.makedirs(os.path.join(basedir, 'clusters')) fileroot = 'straightline' # output file name broot = os.path.join(basedir, fileroot) # import PolyChord import pypolychord from pypolychord.settings import PolyChordSettings # class for passing setup information from pypolychord.priors import UniformPrior # pre-defined class for a uniform prior # setup run settings using the PolyChordSetting class pargs = {'nlive': nlive, 'precision_criterion': tol, 'base_dir': basedir, 'file_root': fileroot, 'write_resume': False, # don't output a resume file 'read_resume': False} # don't read a resume file settings = PolyChordSettings(ndims, nderived, **pargs) t0 = time() output = pypolychord.run_polychord(loglikelihood_polychord, ndims, nderived, settings, prior_transform_polychord) t1 = time() timepolychord = (t1-t0) print("Time taken to run 'PyPolyChord' is {} seconds".format(timepolychord)) # reset stack resource limit resource.setrlimit(resource.RLIMIT_STACK, curlimit)
samplefile = broot+'_equal_weights.txt' samples_polychord = np.loadtxt(samplefile) samples_polychord = samples_polychord[:,-ndims:] # extract the last 'ndims' columns # print out the number of posterior samples print('Number of posterior samples is {}'.format(samples_polychord.shape[0])) resdict['mpolychord_mu'] = np.mean(samples_polychord[:,0]) # mean of m samples resdict['mpolychord_sig'] = np.std(samples_polychord[:,0]) # standard deviation of m samples resdict['cpolychord_mu'] = np.mean(samples_polychord[:,1]) # mean of c samples resdict['cpolychord_sig'] = np.std(samples_polychord[:,1]) # standard deviation of c samples resdict['ccpolychord'] = np.corrcoef(samples_polychord.T)[0,1] # correlation coefficient between parameters resdict['polychord_npos'] = len(samples_polychord) # number of posterior samples resdict['polychord_time'] = timepolychord # run time resdict['polychord_logZ'] = output.logZ # log marginalised likelihood resdict['polychord_logZerr'] = output.logZerr # uncertainty on log(Z) # plot using corner.py plotposts(samples_polychord)
weights = np.loadtxt(broot+'.txt')[:,0] # get weights resdict['espolychord'] = ess(weights) resdict['esspolychord'] = int(resdict['espolychord'] / timepolychord) print("Effective samples per second: {}".format(resdict['esspolychord']))
One code to rule them all!¶
There are a variety of different styles, syntaxes and formats required to use the various packages decribed above, so wouldn't it be nice to have a unified interface to access a range of different samplers!
Well, there are in fact a few different packages that can allow you to use more than on of the above samplers, e.g., Monte Python, cronus, Gleipnir and bilby (other may, and probably do, exist).
bilby¶
Of these, bilby (Ashton et al, 2019) provides the access to the widest range of samplers (I may be biased as one of the developers), including both MCMC and nested sampler packages. An example of using bilby for a linear regression model can be found here, but I'll reproduce something equivalent to the example in this notebook below.
Bilby has a selection of built-in likelihood functions and prior distributions. You can define your own likelihood, but here we'll use the built-in Gaussian likelihood function (the source code for which can be found here). By default, bilby will use the dynesty sampler, but we'll explicitly set it in this example.
# import the likelihood function and prior function from bilby.core.likelihood import GaussianLikelihood from bilby.core.prior import Gaussian, Uniform # define the likelihood function likelihood = bilby.core.likelihood.GaussianLikelihood(x, data, straight_line, sigma=sigma) cmin = -10. # lower range of uniform distribution on c cmax = 10. # upper range of uniform distribution on c mmu = 0. # mean of Gaussian distribution on m msigma = 10. # standard deviation of Gaussian distribution on m # define the priors in a dictionary priors = { "m": Gaussian(mmu, msigma, name="m"), "c": Uniform(cmin, cmax, name="c"), # run the sampler (using dynesty) bilbyresult = bilby.run_sampler( likelihood=likelihood, priors=priors, outdir="bilby", # the directory to output the results to label="bilby", # a label to apply to the output file plot=False, # by default this is True and a corner plot will be produced injection_parameters={"m": m, "c": c}, # (optional) true values for adding to plots sampler='dynesty', # set the name of the sampler nlive=1024, # add in any keyword arguments used by that sampler sample='unif',Summary¶
The timings given in this notebook are not directly comparable as the various codes have not been run with identical settings, and the number of likelihood evaluations produced can be quite different. But, they may be helpful in providing a rough ball-park of expected run times. As mentioned in relation to emcee above, it is important to note that different samplers will produce different numbers of truly independent samples, so some methods that may take longer to run may actually give more independent samples. Therefore comparisons of the number of effective samples per second may be more useful, although these will still be quite problem dependent. It is also worth noting that some of the codes can make use or multiple CPU cores, or GPUs, but these have not been used here. So, speed-ups could be achieved by using these in-built parallelisations.
The number of posterior samples produced by the nested sampling codes can depend strongly on the number of initial live points used. However, for the codes where we have specified a number of live points the final posterior sizes are very comparable. It should be noted that comparing posterior sample sizes and run times for DNest4 in particular is not entirely fair, as it is designed with more complex problems in mind and is not created to be efficient at simple examples where other methods work easily.
Something that is also worth noting, in particular with regard to nested sampling algorithms, is that they will generally take longer to run when there is a larger degree of difference between the prior and the posterior (i.e., the data contains a lot of new information, which, in technical terms, is called having a large Kullback-Leibler (KL) divergence). So, estimating the parameters of a model when the data contains a high signal-to-noise ratio signal will often take longer than the same model but with a weaker signal in the data.
Now, let's overplot everything (it will look a bit horrific)!
# get a list of all the samples chains = [samples_emcee, samples_pymc3, samples_zeus , samples_tfp, samples_pystan, samples_pyjags, samples_nestle, samples_cpnest, samples_dynesty, samples_ultranest, samples_pymnest, samples_dnest4, samples_polychord] color = cm.gnuplot(np.linspace(0, 1, len(chains))) hist2dkwargs = {'plot_datapoints': False, 'plot_density': False, 'levels': 1.0 - np.exp(-0.5 * np.arange(1.5, 2.1, 0.5) ** 2)} # roughly 1 and 2 sigma for i, chain in enumerate(chains): cv = to_hex(color[i]) if i == 0: fig = corner.corner( chain, labels=[r"$m$", r"$c$"], color=cv, hist_kwargs={'density': True}, **hist2dkwargs, else: if i < len(chains)-1: corner.corner( chain, labels=[r"$m$", r"$c$"], fig=fig, color=cv, hist_kwargs={'density': True}, **hist2dkwargs, else: # plot true values on last plot corner.corner( chain, labels=[r"$m$", r"$c$"], truths=[m, c], fig=fig, color=cv, hist_kwargs={'density': True}, **hist2dkwargs, fig.set_size_inches(10.0, 9.0) outputtable = """ | Sampler | $m$ | $c$ | corr. coeff | num. effective samples | effective samples / s | |--- | :---: | :---: | :---: | :---: | :---: | | emcee | {memcee_mu:.3f} ± {memcee_sig:.3f} | {cemcee_mu:.3f} ± {cemcee_sig:.3f} | {ccemcee:.3f} | {esemcee:d} | {essemcee:d} | | PyMC3 | {mpymc3_mu:.3f} ± {mpymc3_sig:.3f} | {cpymc3_mu:.3f} ± {cpymc3_sig:.3f} | {ccpymc3:.3f} | {espymc3:d} | {esspymc3:d} | | TFP | {mtfp_mu:.3f} ± {mtfp_sig:.3f} | {ctfp_mu:.3f} ± {ctfp_sig:.3f} | {cctfp:.3f} | {estfp:d} | {esstfp:d} | | emcee | {mzeus_mu:.3f} ± {mzeus_sig:.3f} | {czeus_mu:.3f} ± {czeus_sig:.3f} | {cczeus:.3f} | {eszeus:d} | {esszeus:d} | | PyStan | {mpystan_mu:.3f} ± {mpystan_sig:.3f} | {cpystan_mu:.3f} ± {cpystan_sig:.3f} | {ccpystan:.3f} | {espystan:d} | {esspystan:d} | | PyJAGS | {mpyjags_mu:.3f} ± {mpyjags_sig:.3f} | {cpyjags_mu:.3f} ± {cpyjags_sig:.3f} | {ccpyjags:.3f} | {espyjags:d} | {esspyjags:d} | | Nestle | {mnestle_mu:.3f} ± {mnestle_sig:.3f} | {cnestle_mu:.3f} ± {cnestle_sig:.3f} | {ccnestle:.3f} | {esnestle:d} | {essnestle:d} | | CPNest | {mcpnest_mu:.3f} ± {mcpnest_sig:.3f} | {ccpnest_mu:.3f} ± {ccpnest_sig:.3f} | {cccpnest:.3f} | {escpnest:d} | {esscpnest:d} | | dynesty | {mdynesty_mu:.3f} ± {mdynesty_sig:.3f} | {cdynesty_mu:.3f} ± {cdynesty_sig:.3f} | {ccdynesty:.3f} | {esdynesty:d} | {essdynesty:d} | | UltraNest | {multranest_mu:.3f} ± {multranest_sig:.3f} | {cultranest_mu:.3f} ± {cultranest_sig:.3f} | {ccultranest:.3f} | {esultranest:d} | {essultranest:d} | | PyMultiNest | {mpymnest_mu:.3f} ± {mpymnest_sig:.3f} | {cpymnest_mu:.3f} ± {cpymnest_sig:.3f} | {ccpymnest:.3f} | {espymnest:d} | {esspymnest:d} | | DNest4 | {mdnest4_mu:.3f} ± {mdnest4_sig:.3f} | {cdnest4_mu:.3f} ± {cdnest4_sig:.3f} | {ccdnest4:.3f} | {esdnest4:.2f} | {essdnest4:.2f} | | PyPolyChord | {mpolychord_mu:.3f} ± {mpolychord_sig:.3f} | {cpolychord_mu:.3f} ± {cpolychord_sig:.3f} | {ccpolychord:.3f} | {espolychord:d} | {esspolychord:d} | Markdown(outputtable.format(**resdict)) | Sampler | $\ln{{Z}}$ | num. posterior samples | run time (s) | | --- | --- | :---: | --- | | Nestle | {nestle_logZ:.3f} ± {nestle_logZerr:.3f} | {nestle_npos} | {nestle_time:.3f} | | CPNest | {cpnest_logZ:.3f} ± {cpnest_logZerr:.3f} | {cpnest_npos} | {cpnest_time:.3f} | | dynesty | {dynesty_logZ:.3f} ± {dynesty_logZerr:.3f} | {dynesty_npos} | {dynesty_time:.3f} | | UltraNest | {ultranest_logZ:.3f} ± {ultranest_logZerr:.3f} | {ultranest_npos} | {ultranest_time:.3f} | | PyMultiNest | {pymnest_logZ:.3f} ± {pymnest_logZerr:.3f} | {pymnest_npos} | {pymnest_time:.3f} | | DNest4 | {dnest4_logZ:.3f} | {dnest4_npos} | {dnest4_time:.3f} | | PyPolyChord| {polychord_logZ:.3f} ± {polychord_logZerr:.3f} | {polychord_npos} | {polychord_time:.3f} | Markdown(outputnest.format(**resdict))Running this notebook¶
I have created a Docker image, available on Quay.io, in which Python 3.7 and all the above samplers are installed (this is roughly 5 Gb in size, so may take some time to download). This can be used to run this notebook in Jupyter. You can download this notebook from here.
If you have Docker installed then under Linux, Mac OS or Windows you can run the notebook by opening a terminal (e.g., the Powershell in Windows) and running:
sudo docker run -i -t -p 8888:8888 -v ${notebook_location}:/samplers quay.io/mattpitkin/samplers-demo:latest /bin/bash -c "jupyter notebook --notebook-dir=/samplers --ip='*' --port=8888 --no-browser --allow-root --MultiKernelManager.default_kernel_name=Samplers"
where