Johnson Lindenstrauss Lemma (Informal)

Problem Statement

For any $0 < \epsilon < 1$ and any integer, let $k$ be a positive integer such that

\[k > \frac{16}{\epsilon^2} \log (\frac{n}{\delta})\]

then for any set $A$ with $n$ points in $\mathbb{R}^d$ there exists a map $f:\mathbb{R}^d \mapsto \mathbb{R}^k$ such that for all $x_i, x_j \in A$,

\[\left( 1-\epsilon \right)\|x_i - x_j\|_2^2 \leq \|f(x_i) - f(x_j)\|_2^2 \leq (1+\epsilon) \|x_i - x_j\|_2^2\]

holds with probability at least $1-\delta$.

Consider random mapping $f(x) = \textbf{Z}x/\sqrt{k}$, $\textbf{Z}$ is a random matrix with each entry formed by standard random variable ($\textbf{Z}_{ij}\sim\mathcal{N}(0,1)$).

Lemma1: Chi distribution concentration bounds

For a random variable $X \sim \chi^2(1)$, we have $\mathbb{E}X = 1$. By the Chernoff bound,

\[\Pr(X \geq 1 + \epsilon) \leq \inf_{t>0} \exp(-t(1+\epsilon)) \, \mathbb{E} e^{tX}.\]

The moment generating function of a chi-squared distribution with $1$ degree of freedom is $\mathbb{E}e^{tX} = (1-2t)^{-1/2}$ for $t < \frac{1}{2}$, hence

\[\Pr(X \geq 1+\epsilon) \leq \inf_{0<t<\frac{1}{2}} \exp\left(-t(1+\epsilon) - \frac{1}{2}\log(1-2t)\right).\]

Taking $t = \frac{\epsilon}{2(1+\epsilon)}$ yields

\[\Pr(X \geq 1+\epsilon) \leq \exp\left(-\frac{1}{2}\left(\epsilon - \log(1+\epsilon)\right)\right).\]

Using the inequality $\epsilon - \log(1+\epsilon) \geq \frac{\epsilon^2}{4}$ for $\epsilon \in (0,1)$, we obtain

\[\Pr(X \geq 1+\epsilon) \leq \exp\left(-\frac{\epsilon^2}{8}\right).\]

Similarly, for i.i.d. random variables $X_1, \dots, X_N \sim \chi^2(1)$, we have

\[\Pr\left(\frac{1}{N}\sum_{i=1}^{N} X_i \geq 1 + \epsilon \right) \leq \inf_{t>0} \exp\left(-tN(1+\epsilon)\right) \mathbb{E}\exp\left(t\sum_{i=1}^N X_i\right).\]

By independence,

\[\mathbb{E}\exp\left(t\sum_{i=1}^N X_i\right) = \prod_{i=1}^N \mathbb{E}e^{tX_i} = (1-2t)^{-\frac{N}{2}},\]

thus

\[\Pr\left(\frac{1}{N}\sum_{i=1}^{N} X_i \geq 1 + \epsilon \right) \leq \inf_{0<t<\frac{1}{2}} \exp\left(-tN(1+\epsilon) - \frac{N}{2}\log(1-2t)\right).\]

Choosing $t = \frac{\epsilon}{2(1+\epsilon)}$ gives

\[\Pr\left(\frac{1}{N}\sum_{i=1}^{N} X_i \geq 1 + \epsilon \right) \leq \exp\left(-\frac{N}{2}\left(\epsilon - \log(1+\epsilon)\right)\right),\]

and applying $\epsilon - \log(1+\epsilon) \geq \frac{\epsilon^2}{4}$ yields

\[\Pr\left(\frac{1}{N}\sum_{i=1}^{N} X_i \geq 1 + \epsilon \right) \leq \exp\left(-\frac{N\epsilon^2}{8}\right).\]

Main Lemma

\[\frac{\|f(x_i)-f(x_j)\|_2^2}{\| x_i - x_j\|_2^2} = \frac{1}{k}\left\|\mathbf{Z}\frac{(x_i-x_j)}{\|x_i -x_j\|_2}\right\|_2^2\]

It suffices to consider a unit vector $a$ under the $l_2$ norm, i.e., $|a|_2 = 1$.

Step 1: Distribution of projected coordinates

\[(\mathbf{Z}a)_m = \mathbf{Z}_m^\top a = \sum_{l=1}^d \mathbf{Z}_{ml} a_l \sim \mathcal{N}(0, \|a\|_2^2) = \mathcal{N}(0,1)\]

Step 2: Chi-square structure

Thus,

\[\|\mathbf{Z}a\|_2^2 = \sum_{m=1}^{k} X_m, \quad X_m \sim \chi^2(1)\]

From Lemma 1, we have the concentration bound

\[\Pr\left( \left|\frac{1}{k}\sum_{m=1}^{k} X_m - 1 \right| \ge \epsilon \right) \le 2 \exp\left(-\frac{k}{8}\epsilon^2\right)\]

Step 3: Union bound over all pairs

Using union bound,

\[\begin{aligned} &\Pr\left( \bigcup_{i \ne j} \left| 1 - \frac{\|f(x_i) - f(x_j)\|_2^2}{\|x_i - x_j\|_2^2} \right| \ge \epsilon \right) \\ &\le \sum_{i \ne j} \Pr\left( \left| 1 - \frac{\|f(x_i) - f(x_j)\|_2^2}{\|x_i - x_j\|_2^2} \right| \ge \epsilon \right) \\ &= \frac{n(n-1)}{2} \cdot 2 \exp\left(-\frac{k}{8}\epsilon^2\right) \\ &\le n^2 \exp\left(-\frac{k}{8}\epsilon^2\right) \end{aligned}\]

Step 4: Dimension requirement

To ensure the RHS is at most $\delta$, it suffices that

\[n^2 \exp\left(-\frac{k}{8}\epsilon^2\right) \le \delta\]

Taking logarithm, we obtain

\[k \ge \frac{8}{\epsilon^2} \left( 2\log n + \log \frac{1}{\delta} \right)\]

In particular, taking $\delta = 1/n$, we have

\[k = O\left(\frac{\log n}{\epsilon^2}\right)\]

References

[1]https://cs.stanford.edu/people/mmahoney/cs369m/Lectures/lecture1.pdf

[2] https://djalil.chafai.net/blog/2025/02/14/back-to-basics-johnson-lindenstrauss-lemma/

[3]https://www.stats.ox.ac.uk/~rebeschi/teaching/AFoL/22/material/problem3.pdf




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra
  • Displaying External Posts on Your al-folio Blog
  • a post with plotly.js
  • a post with image galleries
  • a post with tabs