Johnson Lindenstrauss Lemma (Informal)
Problem Statement
For any $0 < \epsilon < 1$ and any integer, let $k$ be a positive integer such that
\[k > \frac{16}{\epsilon^2} \log (\frac{n}{\delta})\]then for any set $A$ with $n$ points in $\mathbb{R}^d$ there exists a map $f:\mathbb{R}^d \mapsto \mathbb{R}^k$ such that for all $x_i, x_j \in A$,
\[\left( 1-\epsilon \right)\|x_i - x_j\|_2^2 \leq \|f(x_i) - f(x_j)\|_2^2 \leq (1+\epsilon) \|x_i - x_j\|_2^2\]holds with probability at least $1-\delta$.
Consider random mapping $f(x) = \textbf{Z}x/\sqrt{k}$, $\textbf{Z}$ is a random matrix with each entry formed by standard random variable ($\textbf{Z}_{ij}\sim\mathcal{N}(0,1)$).
Lemma1: Chi distribution concentration bounds
For a random variable $X \sim \chi^2(1)$, we have $\mathbb{E}X = 1$. By the Chernoff bound,
\[\Pr(X \geq 1 + \epsilon) \leq \inf_{t>0} \exp(-t(1+\epsilon)) \, \mathbb{E} e^{tX}.\]The moment generating function of a chi-squared distribution with $1$ degree of freedom is $\mathbb{E}e^{tX} = (1-2t)^{-1/2}$ for $t < \frac{1}{2}$, hence
\[\Pr(X \geq 1+\epsilon) \leq \inf_{0<t<\frac{1}{2}} \exp\left(-t(1+\epsilon) - \frac{1}{2}\log(1-2t)\right).\]Taking $t = \frac{\epsilon}{2(1+\epsilon)}$ yields
\[\Pr(X \geq 1+\epsilon) \leq \exp\left(-\frac{1}{2}\left(\epsilon - \log(1+\epsilon)\right)\right).\]Using the inequality $\epsilon - \log(1+\epsilon) \geq \frac{\epsilon^2}{4}$ for $\epsilon \in (0,1)$, we obtain
\[\Pr(X \geq 1+\epsilon) \leq \exp\left(-\frac{\epsilon^2}{8}\right).\]Similarly, for i.i.d. random variables $X_1, \dots, X_N \sim \chi^2(1)$, we have
\[\Pr\left(\frac{1}{N}\sum_{i=1}^{N} X_i \geq 1 + \epsilon \right) \leq \inf_{t>0} \exp\left(-tN(1+\epsilon)\right) \mathbb{E}\exp\left(t\sum_{i=1}^N X_i\right).\]By independence,
\[\mathbb{E}\exp\left(t\sum_{i=1}^N X_i\right) = \prod_{i=1}^N \mathbb{E}e^{tX_i} = (1-2t)^{-\frac{N}{2}},\]thus
\[\Pr\left(\frac{1}{N}\sum_{i=1}^{N} X_i \geq 1 + \epsilon \right) \leq \inf_{0<t<\frac{1}{2}} \exp\left(-tN(1+\epsilon) - \frac{N}{2}\log(1-2t)\right).\]Choosing $t = \frac{\epsilon}{2(1+\epsilon)}$ gives
\[\Pr\left(\frac{1}{N}\sum_{i=1}^{N} X_i \geq 1 + \epsilon \right) \leq \exp\left(-\frac{N}{2}\left(\epsilon - \log(1+\epsilon)\right)\right),\]and applying $\epsilon - \log(1+\epsilon) \geq \frac{\epsilon^2}{4}$ yields
\[\Pr\left(\frac{1}{N}\sum_{i=1}^{N} X_i \geq 1 + \epsilon \right) \leq \exp\left(-\frac{N\epsilon^2}{8}\right).\]Main Lemma
\[\frac{\|f(x_i)-f(x_j)\|_2^2}{\| x_i - x_j\|_2^2} = \frac{1}{k}\left\|\mathbf{Z}\frac{(x_i-x_j)}{\|x_i -x_j\|_2}\right\|_2^2\]It suffices to consider a unit vector $a$ under the $l_2$ norm, i.e., $|a|_2 = 1$.
Step 1: Distribution of projected coordinates
\[(\mathbf{Z}a)_m = \mathbf{Z}_m^\top a = \sum_{l=1}^d \mathbf{Z}_{ml} a_l \sim \mathcal{N}(0, \|a\|_2^2) = \mathcal{N}(0,1)\]Step 2: Chi-square structure
Thus,
\[\|\mathbf{Z}a\|_2^2 = \sum_{m=1}^{k} X_m, \quad X_m \sim \chi^2(1)\]From Lemma 1, we have the concentration bound
\[\Pr\left( \left|\frac{1}{k}\sum_{m=1}^{k} X_m - 1 \right| \ge \epsilon \right) \le 2 \exp\left(-\frac{k}{8}\epsilon^2\right)\]Step 3: Union bound over all pairs
Using union bound,
\[\begin{aligned} &\Pr\left( \bigcup_{i \ne j} \left| 1 - \frac{\|f(x_i) - f(x_j)\|_2^2}{\|x_i - x_j\|_2^2} \right| \ge \epsilon \right) \\ &\le \sum_{i \ne j} \Pr\left( \left| 1 - \frac{\|f(x_i) - f(x_j)\|_2^2}{\|x_i - x_j\|_2^2} \right| \ge \epsilon \right) \\ &= \frac{n(n-1)}{2} \cdot 2 \exp\left(-\frac{k}{8}\epsilon^2\right) \\ &\le n^2 \exp\left(-\frac{k}{8}\epsilon^2\right) \end{aligned}\]Step 4: Dimension requirement
To ensure the RHS is at most $\delta$, it suffices that
\[n^2 \exp\left(-\frac{k}{8}\epsilon^2\right) \le \delta\]Taking logarithm, we obtain
\[k \ge \frac{8}{\epsilon^2} \left( 2\log n + \log \frac{1}{\delta} \right)\]In particular, taking $\delta = 1/n$, we have
\[k = O\left(\frac{\log n}{\epsilon^2}\right)\]References
[1]https://cs.stanford.edu/people/mmahoney/cs369m/Lectures/lecture1.pdf
[2] https://djalil.chafai.net/blog/2025/02/14/back-to-basics-johnson-lindenstrauss-lemma/
[3]https://www.stats.ox.ac.uk/~rebeschi/teaching/AFoL/22/material/problem3.pdf
Enjoy Reading This Article?
Here are some more articles you might like to read next: