Speaker
Description
The Poisson log-normal (PLN) model is a generic model for the joint distribution of count data, accounting for covariates. It is also an incomplete data model. A classical way to achieve maximum likelihood inference for model parameters $\theta$ is to resort to the EM algorithm, which aims at maximizing, with respect to $\theta$, the conditional expectation, given the observed data $Y$, of the so-called complete log-likelihood $\mathbb{E}[\log p_\theta(Y, Z) \mid Y]$.
Unfortunately, the evaluation of $\mathbb{E}[\log p_\theta(Y, Z) \mid Y]$ is intractable in the case of the PLN model because the conditional distribution of the latent vector conditionally on the corresponding observed count vector has no closed form and none of its moments can be evaluated in an efficient manner.
Variational approaches have been studied to tackle this problem but lack from statistical guarantees. Indeed the resulting estimate $\widehat{\theta}_{VEM}$ does not enjoy the general properties of MLEs. In particular, the (asymptotic) variance of $\widehat{\theta}_{VEM}$ is unknown, so no test nor confidence interval can be derived easily from the variational inference. Starting from already available variational approximations, we define a first Monte Carlo EM algorithm to obtain maximum likelihood estimators of this model. We then extend this first algorithm to the case of a composite likelihood in order to be able to handle higher dimensional count data. Both methods are statically grounded and provide confidence region for model parameters.
Classification | Both methodology and application |
---|---|
Keywords | Multivariate count data, Monte Carlo EM algorithm, composite likelihood |