Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
I am new to both, Python and MCMC techniques and am working my way into PyMC3. As an exercise to familiarize myself with PyMC3 I would like to fit a mixture model of two shifted gamma distributions to generated data. As a next step I would then like to extend this with a stick-breaking process to an "arbitrary" number of shifted gammas, but one step at a time.
Pointers to the complete code and further links can be found in my comment to the following PyMC3 github issue:
https://github.com/pymc-devs/pymc3/issues/864#issuecomment-285864556
Here is the model that I try to implement:
with pm.Model() as model:
p = pm.Beta( "p", 1., 1., testval=0.5) #this is the fraction that come from mean1 vs mean2
alpha = pm.Uniform('alpha', 0, 10) # == k
beta = pm.Uniform('beta' , 0, 1) # == theta
shifts = pm.Normal('shifts', mu=20, sd=50, shape=2)
gamma = ShiftedGamma.dist(alpha, beta, c=shifts)
process = pm.Mixture('obs', tt.stack([p, 1-p]), gamma, observed=data)
with model:
step = pm.Metropolis()
trace = pm.sample(10000, step=step)
And here is the implementation of the ShiftedGamma that I came up with:
class ShiftedLog(pm.distributions.continuous.transforms.ElemwiseTransform):
name = "log"
def __init__(self, c=0.0):
self.c = c
def backward(self, x):
return tt.exp(x + self.c)
def forward(self, x):
return tt.log(x - self.c)
class ShiftedGamma(pm.distributions.continuous.Continuous):
def __init__(self, alpha=None, beta=None, mu=None, sd=None, c=0.0, *args, **kwargs):
super(ShiftedGamma, self).__init__(transform=ShiftedLog(c), *args, **kwargs)
alpha, beta = self.get_alpha_beta(alpha, beta, mu, sd)
self.alpha = alpha
self.beta = beta
self.mean = alpha / beta + c
self.mode = tt.maximum((alpha - 1) / beta, 0) + c
self.variance = alpha / beta**2
self.c = c
# self.testval = kwargs.pop('testval', None)
# if not self.testval:
# self.testval = c + 10.0
pm.distributions.continuous.assert_negative_support(alpha, 'alpha', 'Gamma')
pm.distributions.continuous.assert_negative_support(beta, 'beta', 'Gamma')
def get_alpha_beta(self, alpha=None, beta=None, mu=None, sd=None):
if (alpha is not None) and (beta is not None):
elif (mu is not None) and (sd is not None):
alpha = mu**2 / sd**2
beta = mu / sd**2
else:
raise ValueError('Incompatible parameterization. Either use '
'alpha and beta, or mu and sd to specify '
'distribution.')
return alpha, beta
def random(self, point=None, size=None, repeat=None):
alpha, beta, c = draw_values([self.alpha, self.beta, self.c], point=point)
return generate_samples(stats.gamma.rvs, alpha, scale=1. / beta, dist_shape=self.shape, size=size, loc=c)
def logp(self, value):
alpha = self.alpha
beta = self.beta
c = self.c
x = value - c
return bound(-gammaln(alpha) + logpow(beta, alpha) - beta * x + logpow(x, alpha - 1), x >= 0, alpha > 0, beta > 0)
I have basically 2 questions:
1) Is my implementation "good"/"correct"? I am especially uncertain about the transform parameter, where I implemented a ShiftedLog transform. And is the random() method used anywhere at all? If I set a breakpoint here it does never seem to get called.
2) Why is this model so fragile against basically any changes I make to either the priors or the sampling method? Just see the many comments in the gist to see my experimenting back-and-forth until I get results from PyMC3.
Point 2) might be a consequence of having made a mistake in point 1) or it might be any of the magic reasons as explained by Michael Betancourt in his talk at StanCon: "Everything You Should Have Learned About Markov Chain Monte Carlo"?
Around my question 1) I would also like to understand if there are better or easier ways to achieve the same effect of fitting a model with shifted gammas to data, e.g. is it really necessary to implement the ShiftedGamma or could I achieve the same effect via pm.Deterministic() or similar?
Thanks a lot for your help and insight!
Christian
–
–
I originally posted this on the github issue, but I guess this is a better place.
You seem to have a typo in the transformation: shouldn't backward be tt.exp(x) + self.c? In this case it would be the same as pm.distributions.transforms.lowerbound.
Otherwise your approach seems fine. I don't know of any way to do this with deteministics, as the mixture requires a distribution. In pretty much all other circumstances a ShiftedGamma should not be necessary though.
I think you could simplify the other code to something like this: (please double check, I did not test this properly...)
class ShiftedGamma(pm.Gamma):
def __init__(self, alpha, beta, shift, *args, **kwargs):
transform = pm.distributions.transforms.lowerbound(shift)
super().__init__(alpha=alpha, beta=beta, *args, **kwargs,
transform=transform)
self.shift = shift
self.mean += shift
self.mode += shift
def random(self):
return super().random() + self.shift
def logp(self, x):
return super().logp(x - self.shift)
with pm.Model() as model:
p = pm.Beta( "p", 1., 1., testval=0.5)
alpha = pm.Uniform('alpha', 0, 10) # == k
beta = pm.Uniform('beta' , 0, 1) # == theta
shifts = pm.Normal('shifts', mu=20, sd=50, shape=2, testval=floatX([0., 0.]))
gamma = ShiftedGamma.dist(alpha=alpha, beta=beta, shift=shifts)
pm.Mixture('y', tt.stack([p, 1-p]), gamma, observed=data))
About the difficulties with initialization and sampling:
You are using the MAP, which is discouraged most of the time, using advi for initialization is usually better.
I think the reason you have such a hard time is that many values of shift lead to a logp of -inf, namely if the minimum of the shifts is larger than the minimum of the dataset. You could try to reparameterize somehow, to make sure this can't happen. Maybe something like this:
shift1_raw = pm.HalfCauchy('shift1_raw', beta=1)
shift2_raw = pm.HalfCauchy('shift2_raw', beta=1)
shift1 = data.min() - shift1_raw
shift2 = shift1 + shift2_raw
shift = tt.stack([shift1, shift2])
shift = pm.Deterministic('shift', shift)
But this definitely requires more thought; this version changes the prior, and this new prior depends on the dataset, and shift1_raw and shift2_raw will probably be highly correlated (which makes the sampler unhappy).
Also, keep in mind that mixture models typically end up being multimodal, and NUTS does not usually work well for those.
Edit As a side-note, if you just want a general understanding of pymc3 and do not have specific interest in Mixture models, maybe don't start with those. They tend to be trouble.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.