EM algorithm to fit models including random effects in Julia

Author

Sean L. Wu (slwood89@gmail.com)

Published

December 1, 2022

EM algorithm

These notes follow the presentation in (Wood 2015, ch 5). Let us recall the main problem stated in the Laplace approximation using the urchin model. We have a model containing both fixed \(\theta\) and random \(b\) effects and we want to calculate the likelihood of \(\theta\) marginalizing out \(b\).

\[ L(\theta) = f_{\theta}(y) = \int f_{\theta}(y,b) db \]

The Laplace approximation directly approximates that integral, achieving an approximation error of about \(\mathcal{O}(n^{-1})\). The EM algorithm is a more complicated technique but that can achieve an approximation error of about \(\mathcal{O}(n^{-2})\).

We can decompose the integrand \(f_{\theta}(y,b)\) as:

\[ \begin{aligned} f_{\theta}(y,b) &= f_{\theta}(b|y) f_{\theta}(y) \\ \log f_{\theta}(y,b) &= \log f_{\theta}(b|y) f_{\theta}(y) \\ \log f_{\theta}(y,b) &= \log f_{\theta}(b|y) + \log f_{\theta}(y) \end{aligned} \]

Consider some parameter value (guess) \(\theta^{\prime}\). Now what we do is take the expectation of \(\log f_{\theta}(y,b)\) with respect to \(f_{\theta^{\prime}}(b,y)\). This looks like:

\[ \begin{aligned} E_{b|y,\theta^{\theta}} \log f_{\theta}(y,b) &= \int \log f_{\theta}(y,b) f_{\theta^{\prime}}(b|y) db \\ E_{b|y,\theta^{\theta}} \log f_{\theta}(y,b) &= E_{b|y,\theta^{\theta}} \log f_{\theta}(b|y) + \log f_{\theta}(y) \end{aligned} \]

Where we get from the first to the second line by substituting the expanded right hand side of \(\log f_{\theta}(y,b)\) into the integral and recognizing that we can split it into the sum of two integrals, the second of which turns out to just end up being \(\log f_{\theta}(y)\), which is not a function of \(b\).

We now rewrite the final line as:

\[ Q_{\theta^{\prime}}(\theta) = P_{\theta^{\prime}}(\theta) + l(\theta) \]

The EM algorithm works because of the following line of argument:

\(P_{\theta^{\prime}}(\theta)\) is maximized when \(\theta = \theta^{\prime}\), i.e. when we have \(P_{\theta^{\prime}}(\theta^{\prime})\)
If \(Q_{\theta^{\prime}}(\theta) > Q_{\theta^{\prime}}(\theta^{\prime})\), it must be due to an increase in the log likelihood, i.e. that \(l(\theta) > l(\theta^{\prime})\). This is because the \(P\) term can only decrease as we move away from \(\theta^{\prime}\).

References

Wood, Simon N. 2015. Core Statistics. 6. Cambridge University Press.