07.25.2025
In my studies of quantum information and learning theory, I find that a particular computational tool has become the center of rigorous mathematical research: sampling. This post walks through the rich theory of MCMC sampling and the results that allow Monte Carlo simulations to be numerical necessities in many fields.
In quantum information, Path-Integral Monte Carlo (PIMC) is a primary method for simulating thermal states of stoquastic quantum systems. PIMC transforms a quantum partition function into a classical Gibbs distribution by introducing an “imaginary-time” dimension, then applies Markov Chain Monte Carlo to sample from that distribution. Recently, these Markov chains were proven to mix rapidly, using the canonical path method. This rigorously justifies the use of PIMC in partition function estimation.
Suppose we hope to estimate the partition function of a \(1\)D stoquastic Hamiltonian
\[
H = \sum_{i=1}^{n-1}Z_{i}Z_{i+1} + \sum_{i=1}^{n}h_iX_i
\]
With a Suzuki-Trotter decomposition, this quantum system becomes a classical one in \(n \times \beta\) space-time, with a Gibbs distribution over paths \( x \in \{0, 1 \}^{n\beta}\). Sampling from this distribution using MCMC requires Glauber dynamics. However, our estimates can be wildly inaccurate unless we prove mixing time to be small.
In learning theory, specifically in probabilistic inference on graphical models, sampling-based algorithms (e.g. Gibbs sampling, importance sampling) are essential to apprxoimating joint distributions when exact inference is intractable. Consider a Markov random field over binary variables \(X_1, \ldots, X_n\) with pairwise interactions:
\[
P(x) \propto \exp \left( \sum_{(i, j) \in E} \theta_{ij} x_i x_j + \sum_{i}\theta_i x_i \right)
\]
and we want to compute marginal probabilities \(P(X_i = 1)\). Exact inference is \(\mathsf{\# P} \)-hard, so we use Gibbs sampling. But how long does this take to converge to the true distribution? Once again, this depends on how fast the Markov chain mixes. Both of these examples illustrate that sampling is not just a numerical tool, but a computational lifeline in dire need of some theoretical justifications: do these chains converge fast enough for meaningful inference?
Canonical paths are a powerful combinatorial framework for proving rapid mixing in high-dimensional Markov chains. Let us define a set of paths \( \Gamma = \{ \gamma_{x, y} \}_{x, y \in \Omega} \) for each pair of distinct states in the state space \( \Omega \). The edge congestion is defined as
\[
\rho = \max_{e \in E} \frac{1}{Q(e)} \sum_{(x, y) : e \in \gamma_{x, y}} \pi(x)\pi(y)
\]
where \( Q(e) = \pi(u) P(u, v)\) is the ergodic flow through edge \(e_{u \to v}\). In 1989, Jerrum and Sinclair showed that one can lower-bound the conductance \(\Phi\) and spectral gap \(\lambda_1\) with
\[
\Phi \geq \frac{1}{2\rho},\quad \lambda_1 \geq \frac{1}{2\rho^2}
\]
yielding a mixing-time bound (in total variation) of
\[
\tau(\epsilon) \leq \frac{2}{\Phi^2} \left(\ln \frac{1}{\pi_{min}} + \ln \frac{1}{\epsilon} \right).
\]
This foundational result was the justification for their FPRAS algorithm for approximating the permanent of a matrix, and has since been generalized. However, canonical paths designate a single path to each pair of states potentially causing large congestion. So, we introduce fractional flows, a probability distribution over paths for each pair. Let \( f_{x, y}(\gamma)\) be the amount of flow along path \(\gamma\) from \(x\) to \(y\), subject to the constraint:
\[
\sum_{\gamma \in \Gamma} f_{x, y}(\gamma) = \pi(x) \pi(y).
\]
The total load on an edge \(e\) and fractional congestion are defined as
\[
F(e) = \sum_{x, y}\sum_{\gamma \ni e} f_{x, y}(\gamma),\quad \rho_f = \max_{e \in E} \frac{F(e)}{Q(e)}.
\]
A vital insight in modern canonical path theory is that rapid mixing is still ensured when \(\rho_f\) is small. Fractional flows even often yield smaller congestion bounds than deterministic path assignments. Note that path length matters. Diaconis and Stroock showed that with a maximum path length \(l_{max} = \max_{x, y} \left| \gamma_{x, y} \right|\), the spectral gap satisfies the bound
\[
\lambda_1 \geq \frac{1}{\rho_f l_{max}}
\]
This bound tightens mixing time estimates by involving both congestion and how long it takes information to propogate through the graph structure. With long but sparse paths, it can be much stronger than the \(\rho^2\) bound.
In 2025, Chen et al. heavily improved on the Jerrum-Sinclair result by redefining the analysis for the perfect matching chain. For a graph with \(n\) vertices, \(m\) edges, and maximum degree \(\Delta\), they showed that the mixing time satisfies
\[
\tau(\epsilon) = \widetilde{\mathcal{O}} \left(\Delta^2 m + \Delta^2 \log\left(1/\epsilon\right) \right).
\]
In summary, the journey from Jerrum & Sinclair's classical framework to the modern refinement by Chen et al. illustrates the mathematical richness of sampling theory. For problems in quantum simulation and learning algorithms, where sampling is vital, this theory is mathematical assurance that Monte Carlo isn't just plausible, it's provably fast.