I wanted to do a quick post to verify an identity I encountered while reading about SteinGANs. For this post, we’ll consider probability densities in the exponential family, which have the following form,
\begin{align}
p\left(x\vert\theta\right) &= \frac{1}{Z} \exp\left(f\left(x;\theta\right)\right) \\
Z &= \int_{x} p\left(x\vert\theta\right) dx.
\end{align}
Then the claim is that the gradient of the log-likelihood function be be expressed as follows,
\begin{align}
\frac{\nabla_{\theta} \log L}{n} = \frac{1}{n}\sum_{i=1}^{n} \nabla_{\theta} f\left(x_i;\theta\right) - \mathbb{E}_{x\sim p}\left[\nabla_{\theta} f\left(x;\theta\right)\right].
\end{align}
To see this, we can proceed as follows,
\begin{align}
L\left(\theta\vert X\right) &= \prod_{i=1}^{n} \frac{1}{Z} \exp\left(f\left(x_i;\theta\right)\right) \\
\log L\left(\theta\vert X\right) &= \sum_{i=1}^{n} f\left(x_i;\theta\right) - n\log\int_{x} \exp\left(f\left(x;\theta\right)\right) dx \\
\nabla_{\theta} \log L &= \sum_{i=1}^{n} \nabla_{\theta} f\left(x_i;\theta\right) - \frac{n}{Z} \int_{x} \nabla_{\theta} f\left(x;\theta\right) \exp\left(f\left(x;\theta\right)\right) dx \\
\frac{\nabla_{\theta} \log L}{n} &= \frac{1}{n}\sum_{i=1}^{n} \nabla_{\theta} f\left(x_i;\theta\right) - \mathbb{E}_{x\sim p}\left[\nabla_{\theta} f\left(x;\theta\right)\right],
\end{align}
which is what we wanted to show.