I read a very useful description of the EM algorithm as coordinate ascent at this website and wanted to rederive some of the ideas.
The marginal log-likelihood is
\begin{align}
\log \pi(x; \theta) &= \log \int \pi(x, z; \theta) ~\mathrm{d}z \\
&= \log \int \frac{\pi(x, z; \theta)}{\tilde{\pi}(z)} \tilde{\pi}(z)~\mathrm{d}z \\
&= \log \underset{z\sim \tilde{\pi}}{\mathbb{E}} \frac{\pi(x, z; \theta)}{\tilde{\pi}(z)} \\
&\geq \underset{z\sim \tilde{\pi}}{\mathbb{E}} \log \frac{\pi(x, z; \theta)}{\tilde{\pi}(z)} \\
&= \underset{z\sim \tilde{\pi}}{\mathbb{E}} \log \frac{\pi(x; \theta) \pi(z\vert x; \theta)}{\tilde{\pi}(z)} \\
&= \log \pi(x; \theta) + \underset{z\sim \tilde{\pi}}{\mathbb{E}} \log \frac{\pi(z\vert x; \theta)}{\tilde{\pi}(z)} \\
&= \log \pi(x; \theta) - \mathbb{KL}(\tilde{\pi}(z)\Vert \pi(z\vert x; \theta)).
\end{align}
The blog post then proceeds to advocate for thinking of the EM algorithm as coordinate ascent on the ELBO
\begin{align}
\mathrm{ELBO}(\theta, \tilde{\pi}) = \underset{z\sim \tilde{\pi}}{\mathbb{E}} \log \frac{\pi(x, z; \theta)}{\tilde{\pi}(z)}.
\end{align}
Clearly the ELBO depends on two arguments: the latent parameter $\theta$ and the density $\tilde{\pi}$. Suppose that we fix $\theta = \theta’$ and consider maximizing the ELBO with respect to the $\tilde{\pi}$. Then the representation
\begin{align}
\sup_{\tilde{\pi}} \mathrm{ELBO}(\theta’, \tilde{\pi}) &= \sup_{\tilde{\pi}} \pi(x; \theta’) - \mathbb{KL}(\tilde{\pi}(z)\Vert \pi(z\vert x; \theta’)) \\
&= \pi(x; \theta’) - \inf_{\tilde{\pi}} \mathbb{KL}(\tilde{\pi}(z)\Vert \pi(z\vert x; \theta’))
\end{align}
shows that the optimal choice of density is $\tilde{\pi}(z) = \pi(z\vert x; \theta’)$.
Now let us regard $\tilde{\pi}(z) = \pi(z\vert x; \theta’)$ as fixed and consider maximizing the ELBO with reprect to the latent parameter $\theta$. Then we obtain,
\begin{align}
\sup_{\theta} \mathrm{ELBO}(\theta, \pi(z\vert x; \theta’)) &= \sup_{\theta} \underset{z\sim \pi(z\vert x; \theta’)}{\mathbb{E}} \log \frac{\pi(x, z; \theta)}{\pi(z\vert x; \theta’)} \\
&= \sup_{\theta} \underset{z\sim \pi(z\vert x; \theta’)}{\mathbb{E}} \log \pi(x, z; \theta) - \underset{z\sim \pi(z\vert x; \theta’)}{\mathbb{E}} \log \pi(z\vert x; \theta’) \\
&= \sup_{\theta} \underset{z\sim \pi(z\vert x; \theta’)}{\mathbb{E}} \log \pi(x, z; \theta).
\end{align}
Therefore we can view EM as coordinate ascent on the ELBO, where in the expectation step we identify the optimal choice of $\tilde{\pi}(z)$ and in the maximization step we optimize the expected complete data log-likelihood.
Hence the EM sequence of estimators may be defined inductively according to
\begin{align}
\theta_{n+1} = \argmax{\theta} \underset{z\sim \pi(z\vert x; \theta_n)}{\mathbb{E}} \log \pi(x, z; \theta).
\end{align}
We can easily show that the log-likelihood is monotonically increasing for a sequence of estimators constructed in this manner. First observe that
\begin{align}
& \mathrm{ELBO}(\theta_{n+1}, \pi(z\vert x, \theta_n)) \geq \mathrm{ELBO}(\theta_n, \pi(z\vert x, \theta_n)) \\
\implies& \log \pi(x; \theta_{n+1}) - \mathbb{KL}(\pi(z\vert x, \theta_n)\Vert \pi(z\vert x; \theta_{n+1})) \geq \log \pi(x; \theta_n) - \mathbb{KL}(\pi(z\vert x, \theta_n)\Vert \pi(z\vert x; \theta_n)) \\
\implies& \log \pi(x; \theta_{n+1}) - \mathbb{KL}(\pi(z\vert x, \theta_n)\Vert \pi(z\vert x; \theta_{n+1})) \geq \log \pi(x; \theta_n)
\end{align}
But the KL divergence is non-negative and therefore
\begin{align}
\log \pi(x; \theta_{n+1}) \geq \log \pi(x; \theta_n).
\end{align}