# Bernoulli Expectation Maximization

Let’s consider the following generative model of data. \begin{align} z &\sim \mathrm{Bernoulli}(1/2) \\
(x_1,\ldots, x_k) \vert z &\overset{\mathrm{i.i.d.}}{\sim} \begin{cases} \mathrm{Bernoulli}(p_1) &~\mathrm{if}~ z=1 \\\ \mathrm{Bernoulli}(p_0) &~\mathrm{if}~ z=0 \end{cases}. \end{align} The objective is to estimate $(p_0, p_1)$. We observe $x=(x_1,\ldots,x_k)$ but do not observe $z$. The complete data log-likelihood is, \begin{align} \log\mathcal{L}(p_0, p_1\vert x, z) &= \log \pi(z) + \log\pi(x\vert z) \\
&= z\log \frac{1}{2} + (1-z)\log\frac{1}{2} + \log\paren{\mathbf{1}\set{z=1} \prod_{i=1}^k p_1^{x_i} (1-p_1)^{1-x_i} + \mathbf{1}\set{z=0}\prod_{i=1}^k p_0^{x_i} (1-p_0)^{1-x_i}} \\
&= \log\paren{\mathbf{1}\set{z=1} \prod_{i=1}^k p_1^{x_i} (1-p_1)^{1-x_i} + \mathbf{1}\set{z=0}\prod_{i=1}^k p_0^{x_i} (1-p_0)^{1-x_i}} + \mathcal{O}(1) \\
&= \log\paren{\mathbf{1}\set{z=1} p_1^{s} (1-p_1)^{k-s} + \mathbf{1}\set{z=0} p_0^{s} (1-p_0)^{k-s}} + \mathcal{O}(1) \end{align} where $s = \sum_{i=1}^k x_i$. Given $x$, what is the distribution of $z$? \begin{align} \pi(x) &= \pi(x\vert z=1)\pi(z=1) + \pi(x\vert z=0)\pi(z=0) \\
&= \frac{1}{2}(p_1^s (1-p_1)^{k-s}) + \frac{1}{2}(p_0^s (1-p_0)^{k-s}) \\
\pi(z=1\vert x) &= \frac{p_1^s (1-p_1)^{k-s}}{p_1^s (1-p_1)^{k-s} + p_0^s (1-p_0)^{k-s}} \\
\pi(z=0\vert x) &= \frac{p_0^s (1-p_0)^{k-s}}{p_1^s (1-p_1)^{k-s} + p_0^s (1-p_0)^{k-s}} \end{align} Now let’s compute the expectation step. \begin{align} \begin{split} Q(p\vert p^{(n)}) &= \pi(z=1\vert x, p^{(n)}) \paren{s\log p_1 + (k-s)\log (1-p_1)} \\
&\qquad +~ \pi(z=0\vert x, p^{(n)}) \paren{s\log p_0 + (k-s)\log (1-p_0)} \\
&\qquad +~ \mathcal{O}(1) \end{split} \end{align} In the presence of multiple i.i.d. observations, the expected complete log-likelihood becomes, \begin{align} \begin{split} Q(p\vert p^{(n)}) &= \sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) \paren{s_i\log p_1 + (k-s_i)\log (1-p_1)} \\
&\qquad +~ \pi(z_i=0\vert x_i, p^{(n)}) \paren{s_i\log p_0 + (k-s_i)\log (1-p_0)} \\
&\qquad +~ \mathcal{O}(n) \end{split} \end{align} Let us now consider maximizing $Q(p\vert p^{(n)})$ with respect to $p_1$: \begin{align} \frac{\partial}{\partial p_1} Q(p\vert p^{(n)}) &= \sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) \paren{\frac{s_i}{p_1} - \frac{(k-s_i)}{1-p_1}} \\
&= \frac{1}{p_1} \paren{\sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) s_i} - \frac{1}{1-p_1} \paren{\sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) (k-s_i)}. \end{align} Hence, at optimality, \begin{align} p^\star_1 &= \frac{\sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) s_i}{\sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) s_i + \sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) (k-s_i)} \\
&= \frac{\sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) s_i}{\sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) s_i + \pi(z_i=1\vert x_i, p^{(n)}) (k-s_i)} \\
&= \frac{\sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)}) s_i}{k \sum_{i=1}^n \pi(z_i=1\vert x_i, p^{(n)})}. \end{align} By symmetry, \begin{align} p^\star_0 &= \frac{\sum_{i=1}^n \pi(z_i=0\vert x_i, p^{(n)}) s_i}{k\sum_{i=1}^n \pi(z_i=0\vert x_i, p^{(n)})} \end{align}