Notes regarding LMC and a bivariate Gaussian simulation example

LMC Bivariate Gaussian Example

A simple 2d case of Gaussian simulation using Langevin Monte Carlo algorithm
References

A simple 2d case of Gaussian simulation using Langevin Monte Carlo algorithm

Here’s the set up of multivariate Gaussian for a simple 2d example.

Let X be a bivariate Gassian random variable.

Let the mean of X $\mathbb{E}[X]=(0,0)$.

Let the covariance matrix of X be \[ cov(X)= \left[ {\begin{array}{cc} 1 & 0.9 \\ 0.9 & 1 \\ \end{array} } \right] \]

Then the log density function of X is proportional to $-0.5*(X-\mathbb{E}[X])^T cov(X)^{-1} (X-\mathbb{E}[X])$. Note that $(X-\mathbb{E}[X])^T cov(X)^{-1} (X-\mathbb{E}[X])$ denotes the Mahalanobis distance between $X$ and its mean $\mathbb{E}[X]$. For multivariate normal distribution log density function derivation, see here

library(MASS)
require(ggplot2)

## Loading required package: ggplot2

library(mvtnorm)

# Define the target distribution: multivariate Gaussian
target_mean <- c(0, 0)
target_cov <- matrix(c(1, 0.9, 0.9, 1), nrow = 2) # covar matrix of the 2d Gaussian
target_log_density <- function(x) {
  -0.5 * t(x - target_mean) %*% solve(target_cov) %*% (x - target_mean)
}

Gradient of the log-density function is $cov(X)^{-1}(\mathbb{E}[X]-X)$

# Gradient of the log-density function
grad_log_density <- function(x) {
  solve(target_cov) %*% (target_mean - x)
}

Now make use of the Langevin Monte Carlo iteration, which states $X_{k+1} = X_k + \eta * \nabla(log(pdf(X))) / 2 + \sqrt{\eta}*Z_k$ where $Z_k$ is a random drawing from a bivariate normal distribution.

Question: if we need to make random draw from normal distribution for this LMC algorithm to work for normal distribution, doesn’t that mean it’s circulatory?

Answer:

Most of the time, LMC is used to sample random variables from a much more complicated distribution than Gaussian. So the step where we introduce the Gaussian noise to purturb the sequence of variables is not circulatory in those settings.

Question: what does “thinning” mean in the context of LMC sampling technique?

Answer:

Thinning refers to sampling random variables for a distribution by leaving several iterations in between unsampled. It is worth noting that

Question: How does strong convexity come into place when we are using LMC to sample a Gaussian distribution?

Answer:

For a function to be convex, the necessary condition is to check that the Hessian (or “second derivative” for 1d random variable) is positive semi-definite (or $\geq 0$ for 1d random variable).

Definition: A function $f: \Omega \subset \mathbb{R}^n \rightarrow \mathbb{R}$ is a Convex function iff $f(tx + (1-t)y) \leq tf(x) + (1-t)f(y)$.

(Second Order Condition for Convexity) For a function $f \in C^2(\Omega)$, where $\Omega \subset \mathbb{R}^n$ is also a convex set, $f$ is convex iff $\nabla^2f(x) \succeq 0$ for all $x \in \Omega$.

Definition: Let $f\in C^2(\Omega)$ and $f$ is convex. $f$ is strictly convex iff $\nabla^2 f(x) \succ 0$ for all $x \in \Omega$.

Definition: Let $f\in C^2(\Omega)$ and $f$ is convex. $f$ is strongly convex if there exists some $m > 0$ such that $\nabla^2 f(x) \succ mI$ for all $x \in \Omega$, where $I$ is just the identity matrix of the dimensions of Hessian $\nabla^2f(x)$.

So intuitively, the larger the second derivative/Hessian is, the more strongly convex a function is. Therefore, let’s calculate the second derivative of log of pdf of a standard normal random variable.

Let $x \sim N(0,1)$.

We know that log(x) is a concave function, so -log(x) is a convex function. The pdf of a standard normal is a convex function. So the composition of the two is also a convex function. Now let’s calculate how convex it is through second derivative calculation.

\[\begin{align*} -log(pdf(x)) &= -log(\frac{1}{\sqrt{2\pi}\sigma^2}exp(-\frac{(x-\mu)^2}{2\sigma^2}))\\ &= -log(\frac{1}{\sqrt{2\pi}\sigma^2}exp(-\frac{x^2-2x\mu+\mu^2}{2\sigma^2}))\\ &\propto -\frac{x^2-2x\mu+\mu^2}{2\sigma^2}\\ \implies \frac{d^2(-log(pdf(x)))}{dx^2} &\propto \frac{1}{\sigma^2} \end{align*}\]

This shows that the second derivative or Hessian is proportional to $\frac{1}{\sigma^2}$, so the smaller the variance $\sigma^2$, the bigger the second derivative, so the more strongly convex $-log(pdf(x))$ is.

We can relate this to the variance of $X_t$, as we want to achieve the convergence of $var(\lim_{t \rightarrow \infty} X_t) = 1$ since the simulated $X_t$ converges in distribution to $N(0,1)$.

Here’s how we can look at the variance in each iteration of the Langevin Monte Carlo algorithm:

\[\begin{align*} dX_t &= -\nabla(-log(pdf(X_t)))dt + \sqrt{2}dB_t\\ \implies X_{t+\epsilon}-X_t &= -\nabla(-log(pdf(X_t)))(t+\epsilon-t) + \sqrt{2} (B_{t+\epsilon}-B_{t})\\ \implies X_{t+\epsilon} &= X_t - \frac{X_t}{\sigma^2}\epsilon+\sqrt{2}(B_{t+\epsilon}-B_{t})\\ X_{t+\epsilon} &= X_t(1-\frac{\epsilon}{\sigma^2}) + \sqrt{2\epsilon}z_t \text{ where $z_t \sim N(0,1)$, and i.i.d. for each $t$}\\ \end{align*}\]

If we let the starting $X_0$ to be a random variable of $N(0,1)$, then we can tell the variance of $X_\epsilon$ by the above expression.

\[\begin{align*} X_\epsilon &= (1-\frac{\epsilon}{\sigma^2})X_0 + \sqrt{2\epsilon}Z)\\ &\sim N(0,(1-\frac{\epsilon}{\sigma^2})^2 + 2\epsilon) \end{align*}\]

So if we let $X_0 \sim N(0,var(0))$ where $var(0)$ denotes the variance. Then the variance at the $\epsilon$ step and at $t+\epsilon$ step can also be calculated.

\[\begin{align*} X_\epsilon &\sim N(0,(1-\frac{\epsilon}{\sigma^2})^2var(0) + 2\epsilon)\\ \implies var(\epsilon) &= (1-\frac{\epsilon}{\sigma^2})^2var(0) + 2\epsilon\\ X_{t+\epsilon} &\sim N(0,(1-\frac{\epsilon}{\sigma^2})^2var(t) + 2\epsilon))\\ \implies var(t+\epsilon) &= (1-\frac{\epsilon}{\sigma^2})^2var(t) + 2\epsilon \end{align*}\]

Now calculate the derivative of variance with respect to time to see how the variance changes overtime. We expect to express $\frac{dvar(t)}{dt}$ in terms of $var(t)$ so that we can solve the ODE to express $var(t)$ in terms of $t$.

\[\begin{align*} \lim_{\epsilon \rightarrow 0} \frac{var(t+\epsilon)-var(t)}{\epsilon} &= \lim_{\epsilon \rightarrow 0} \frac{(1-\frac{\epsilon}{\sigma^2})^2 var(t)+2\epsilon}{\epsilon}\\ &= \lim_{\epsilon \rightarrow 0} \frac{(-\frac{2\epsilon}{\sigma^2}+\frac{\epsilon^2}{\sigma^4})var(t)+2\epsilon}{\epsilon}\\ &= \lim_{\epsilon \rightarrow 0} (-\frac{2}{\sigma^2}+\frac{\epsilon}{\sigma^4})var(t)+2 \\ &= -\frac{2}{\sigma^2}var(t) + 2\\ \implies \frac{dvar(t)}{dt} &= -\frac{2}{\sigma^2}var(t) + 2 \end{align*}\]

Solving the ODE, we get $var(t) = \sigma^2(1-exp(-\frac{2}{\sigma^2}t))$.

And taking $t$ to infinity, we have $\lim_{t \rightarrow \infty} var(t) = \sigma^2$. Therefore, the smaller the $\sigma^2$, the smaller the variance is, which is what we would ideally want to converge to in infinite time. This is the same conclusion we drew from requiring the function $-log(pdf(x))$ to be strongly convex.

# Langevin dynamics sampling without "thinning"
langevin_sampling <- function(n_samples, step_size, initial_value) {
  samples <- matrix(0, n_samples, length(initial_value))
  x <- initial_value
  
  for (i in 1:n_samples) {
    grad <- grad_log_density(x)
    x <- x + step_size / 2 * grad + sqrt(step_size) * rnorm(length(x))
    samples[i, ] <- x
  }
  
  return(samples)
}

Next, generate examples using the langevin_sampling function defined above.

# Set parameters
n_samples <- 505000
bsz = 5000
step_size <- 0.01
initial_value <- c(0, 0)

# Generate samples
samples <- langevin_sampling(n_samples, step_size, initial_value)
samplesburn = samples[-c(1:bsz),]
thinsz=seq(100,n_samples-bsz,100)

colMeans(samples)

## [1] -0.04580787 -0.04383146

cov(samples)

##           [,1]      [,2]
## [1,] 0.9814081 0.8814491
## [2,] 0.8814491 0.9863900

samplesDf = data.frame(samples)

library(ggplot2)
ggplot(samplesDf, aes(x=X1,y=X2))+
  labs(title = "Langevin Sampling for 2 dimensions",
       x = "x1",
       y = "x2") +
  geom_point(color = "deepskyblue2", alpha = 0.5) + 
  theme_bw()

References

Simsekli et al., Stochastic Quasi-Newton Langevin Monte Carlo (2016)

Nguyen et al. Non-Asymptotic Analysis of Fractional Langevin Monte Carlo for Non-Convex Optimization (2019)

Notes regarding LMC and a bivariate Gaussian simulation example

LMC Bivariate Gaussian Example

Nabarun Deb, Ursula Guo, University of Chicago

05-28-2024

A simple 2d case of Gaussian simulation using Langevin Monte Carlo algorithm

References