Blist Multilingual Theme
*

ELBO - Evidence Lower Bound

Posted on *  •  2 minutes  • 384 words

Intractable Posterior

Latent variables are a transformation of the data points into a continuous lower-dimensional space. A latent variable model involves the evaluation of the posterior distribution $p(z|x)$ of the latent variables $z$ given the observed variables $x$.

The posterior distribution can be obtained by using Baye’s rule

$$ p_{\theta}(z|x) = \frac{p_{\theta}(x|z)p_{\theta}(z)}{p_{\theta}(x)} $$

with,

It often happens in real life, even with simple priors and likelihood functions, calculating the exact posterior distribution becomes intractable in high dimensions. To overcome this issue, instead of calculating the exact posterior several approximation methods are used.

Variational Method

The main idea behind variational methods is to approximate the intractable distribution $p_\theta(z|x)$ with another distribution $q_\phi (z|x)$ from where we are able to sample from. To quantify the difference between the two distributions, we minimize Reverse KL Divergence (expected log likelihood ratio) between the two: $$ D_{KL}(q_\phi||p_\theta) = \mathbb{E_{q\phi}}\left[log\frac{q_\phi(z|x)}{p_\theta(z|x)}\right] $$

The intractable $p_\theta(z|x)$ can be rewritten using Bayes rule:

$$ D_{KL}(q_\phi||p_\theta) = \mathbb{E_{q\phi}}[log(q_\phi(z|x))] - \mathbb{E_{q\phi}}[log(p_\theta(z|x))] $$ and using joint probablity $$ D_{KL}(q_\phi||p_\theta) = \mathbb{E_{q\phi}}[log(q_\phi(z|x))] - \mathbb{E_{q\phi}}\left[ log \frac {p_\theta(z,x)}{p_\theta(x)} \right] $$

Expanding the last term and writing in integral form $$ D_{KL}(q_\phi||p_\theta) = \mathbb{E_{q\phi}}[log(q_\phi(z|x))] - \mathbb{E_{q\phi}}[\mathop{log} p_\theta(z,x)] + \mathop{log} p_\theta(x)\int q_\phi(z|x)dz $$

Since, integral of a density function is 1, leaving only the marginal log likelihood in the last term $$ D_{KL}(q_\phi||p_\theta) = \mathbb{E_{q\phi}}[log(q_\phi(z|x))] - \mathbb{E_{q\phi}}[\mathop{log} p_\theta(z,x)] + \mathop{log} p_\theta(x) $$

Since $D_{KL}(q_\phi||p_\theta) \geq 0$, we can rewrite marginal log likelihood or log Evidence as $$ \mathop{log} p_\theta(x) \geq - \mathbb{E_{q\phi}}[\mathop{log}q_\phi(z|x)] + \mathbb{E_{q\phi}}[\mathop{log} p_\theta(z,x))] $$ where, the right hand side being its **lower bound**. Maximizing lower bound would mean minimizng the $D_{KL}(q_\phi||p_\theta)$

Evidence Lower BOund (ELBO)

$$ ELBO = - \mathbb{E_{q\phi}}[\mathop{log}q_\phi(z|x)] + \mathbb{E_{q\phi}}[\mathop{log} p_\theta(z,x))] $$

$$ = - \mathbb{E_{q\phi}}[\mathop{log}q_\phi(z|x)] + \mathbb{E_{q\phi}}[\mathop{log} p_\theta(x|z))] + \mathbb{E_{q\phi}}[\mathop{log} p_\theta(z))] $$ $$ = \mathbb{E_{q\phi}}[\mathop{log} p_\theta(x|z))] - \mathbb{E_{q\phi}}\left[log \frac{q_\phi(z|x)} {p_\theta(z)} \right] $$

with

ELBO = Expected reconstruction Error - KL Divergence between approx posterior and the prior.

Follow me

I work on everything - molecular simulations, data science and coding