pymc - Fit a mixture model of two shifted gamma distributions in PyMC3

link管理

链接快照平台

输入网页链接，自动生成快照
标签化管理网页链接

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am new to both, Python and MCMC techniques and am working my way into PyMC3. As an exercise to familiarize myself with PyMC3 I would like to fit a mixture model of two shifted gamma distributions to generated data. As a next step I would then like to extend this with a stick-breaking process to an "arbitrary" number of shifted gammas, but one step at a time.

Pointers to the complete code and further links can be found in my comment to the following PyMC3 github issue: https://github.com/pymc-devs/pymc3/issues/864#issuecomment-285864556

Here is the model that I try to implement:

with pm.Model() as model:
    p = pm.Beta( "p", 1., 1., testval=0.5) #this is the fraction that come from mean1 vs mean2
    alpha = pm.Uniform('alpha', 0, 10) # == k
    beta  = pm.Uniform('beta' , 0, 1) # == theta
    shifts = pm.Normal('shifts', mu=20, sd=50, shape=2)
    gamma = ShiftedGamma.dist(alpha, beta, c=shifts)
    process = pm.Mixture('obs', tt.stack([p, 1-p]), gamma, observed=data)
with model:
    step = pm.Metropolis()
    trace = pm.sample(10000, step=step)
And here is the implementation of the ShiftedGamma that I came up with:
class ShiftedLog(pm.distributions.continuous.transforms.ElemwiseTransform):
    name = "log"
    def __init__(self, c=0.0):
        self.c = c
    def backward(self, x):
        return tt.exp(x + self.c)
    def forward(self, x):
        return tt.log(x - self.c)
class ShiftedGamma(pm.distributions.continuous.Continuous):
    def __init__(self, alpha=None, beta=None, mu=None, sd=None, c=0.0, *args, **kwargs):
        super(ShiftedGamma, self).__init__(transform=ShiftedLog(c), *args, **kwargs)
        alpha, beta = self.get_alpha_beta(alpha, beta, mu, sd)
        self.alpha = alpha
        self.beta = beta
        self.mean = alpha / beta + c
        self.mode = tt.maximum((alpha - 1) / beta, 0) + c
        self.variance = alpha / beta**2
        self.c = c
        # self.testval = kwargs.pop('testval', None)
        # if not self.testval:
        #     self.testval = c + 10.0
        pm.distributions.continuous.assert_negative_support(alpha, 'alpha', 'Gamma')
        pm.distributions.continuous.assert_negative_support(beta, 'beta', 'Gamma')
    def get_alpha_beta(self, alpha=None, beta=None, mu=None, sd=None):
        if (alpha is not None) and (beta is not None):
        elif (mu is not None) and (sd is not None):
            alpha = mu**2 / sd**2
            beta = mu / sd**2
        else:
            raise ValueError('Incompatible parameterization. Either use '
                             'alpha and beta, or mu and sd to specify '
                             'distribution.')
        return alpha, beta
    def random(self, point=None, size=None, repeat=None):
        alpha, beta, c = draw_values([self.alpha, self.beta, self.c], point=point)
        return generate_samples(stats.gamma.rvs, alpha, scale=1. / beta, dist_shape=self.shape, size=size, loc=c)
    def logp(self, value):
        alpha = self.alpha
        beta = self.beta
        c = self.c
        x = value - c
        return bound(-gammaln(alpha) + logpow(beta, alpha) - beta * x + logpow(x, alpha - 1), x >= 0, alpha > 0, beta > 0)
I have basically 2 questions:
1) Is my implementation "good"/"correct"? I am especially uncertain about the transform parameter, where I implemented a ShiftedLog transform. And is the random() method used anywhere at all? If I set a breakpoint here it does never seem to get called.
2) Why is this model so fragile against basically any changes I make to either the priors or the sampling method? Just see the many comments in the gist to see my experimenting back-and-forth until I get results from PyMC3.
Point 2) might be a consequence of having made a mistake in point 1) or it might be any of the magic reasons as explained by Michael Betancourt in his talk at StanCon: "Everything You Should Have Learned About Markov Chain Monte Carlo"?
Around my question 1) I would also like to understand if there are better or easier ways to achieve the same effect of fitting a model with shifted gammas to data, e.g. is it really necessary to implement the ShiftedGamma or could I achieve the same effect via pm.Deterministic() or similar?
Thanks a lot for your help and insight!
Christian
                If you follow the link from above to the PyMC3 issue 864 and then click on the link to the gist you can see my PyStan version of the model.  If I use NUTS and help this model with the correct init values it works. Otherwise I get "wrong answers".  If I use HMC then I get the error message: "Exception thrown at line 16: beta_log: Random variable is nan, but must not be nan! ... if this warning occurs often then your model may be either severely ill-conditioned or misspecified."  How would I do this model of two mixed shifted gamma distributions "right"?
– cs224
                Mar 11, 2017 at 16:33
                Please see here in the Stan goolge-group several good answers and pointers from Michael Betancourt and others that provide guidance for mixture models in general: groups.google.com/forum/#!category-topic/stan-users/general/… I am currently reading through the Stan manual cover to cover and then will follow the other links given in this thread.
– cs224
                Mar 15, 2017 at 2:16
I originally posted this on the github issue, but I guess this is a better place.
You seem to have a typo in the transformation: shouldn't backward be tt.exp(x) + self.c? In this case it would be the same as pm.distributions.transforms.lowerbound.
Otherwise your approach seems fine. I don't know of any way to do this with deteministics, as the mixture requires a distribution. In pretty much all other circumstances a ShiftedGamma should not be necessary though.
I think you could simplify the other code to something like this: (please double check, I did not test this properly...)
class ShiftedGamma(pm.Gamma):
    def __init__(self, alpha, beta, shift, *args, **kwargs):
        transform = pm.distributions.transforms.lowerbound(shift)
        super().__init__(alpha=alpha, beta=beta, *args, **kwargs,
                         transform=transform)
        self.shift = shift
        self.mean += shift
        self.mode += shift
    def random(self):
        return super().random() + self.shift
    def logp(self, x):
        return super().logp(x - self.shift)
with pm.Model() as model:
    p = pm.Beta( "p", 1., 1., testval=0.5)
    alpha = pm.Uniform('alpha', 0, 10) # == k
    beta  = pm.Uniform('beta' , 0, 1) # == theta
    shifts = pm.Normal('shifts', mu=20, sd=50, shape=2, testval=floatX([0., 0.]))
    gamma = ShiftedGamma.dist(alpha=alpha, beta=beta, shift=shifts)
    pm.Mixture('y', tt.stack([p, 1-p]), gamma, observed=data))
About the difficulties with initialization and sampling:
You are using the MAP, which is discouraged most of the time, using advi for initialization is usually better.
I think the reason you have such a hard time is that many values of shift lead to a logp of -inf, namely if the minimum of the shifts is larger than the minimum of the dataset. You could try to reparameterize somehow, to make sure this can't happen. Maybe something like this:
shift1_raw = pm.HalfCauchy('shift1_raw', beta=1)
shift2_raw = pm.HalfCauchy('shift2_raw', beta=1)
shift1 = data.min() - shift1_raw
shift2 = shift1 + shift2_raw
shift = tt.stack([shift1, shift2])
shift = pm.Deterministic('shift', shift)
But this definitely requires more thought; this version changes the prior, and this new prior depends on the dataset, and shift1_raw and shift2_raw will probably be highly correlated (which makes the sampler unhappy).
Also, keep in mind that mixture models typically end up being multimodal, and NUTS does not usually work well for those.
Edit As a side-note, if you just want a general understanding of pymc3 and do not have specific interest in Mixture models, maybe don't start with those. They tend to be trouble.
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.