approximating tensor contractions with annealed importance sampling

Tensors are ubiquitous in modern information processing. In classical information, tensors represent high-dimensional data structures like images, videos, and text. In quantum computing, tensors encode quantum states and operations, enabling efficient simulation of quantum circuits.

For example, a quantum circuit, represented by a set of unitaries \( \text{ }\mathcal{U} = \{U_1, U_2, \ldots, U_m\} \), has the probability of measuring a particular bitstring \( x = (x_1, x_2, \ldots, x_n) \) given by the contraction \[ \mathrm{Pr}(x) = \left| \langle x | U_m \cdots U_2 U_1 | 0^{\otimes n} \rangle \right|^2 \] where \( |0^{\otimes n} \rangle \) is the initial state of the circuit. This amplitude corresponds to a tensor network contraction, where each gate \( U_i \in \mathcal{U}\) is represented as a low-rank tensor, with the circuit structure defining the connectivity. The contraction sums over all intermediate qubit states at each layer \[ \langle x | U_m \cdots U_2 U_1 | 0^{\otimes n} \rangle = \sum_{\text{internal indices}}\prod_{v \in \mathcal{V}} T_v[\text{indices}] \] Evaluating this quantity is \(\mathsf{\#P}\)-hard in general, as it requires summing over an exponential number of intermediate states. However, many practical applications only require approximate estimates of these contractions, which can be achieved using Monte Carlo methods.

Annealed importance sampling (AIS), introduced by Radford Neal in 2001, is a powerful altered Markov Chain Monte Carlo method for estimating partition functions or marginal likelihoods of high-dimensional, multimodal distributions. Suppose we want to estimate the normalization constant \(Z\) of a target distribution \[ \pi_K (x) = \frac{\psi(x)^1}{Z} \] which is intractable. AIS interpolates between a simple proposal distribution \(\pi_0(x)\) and the target distribution \(\pi_K(x)\) by introducing a sequence of intermediate distributions \[ \pi_k(x) \propto \psi(x)^{\beta_k},\quad \beta_0 = 0 < \beta_1 < \cdots < \beta_K = 1 \] By gradually "annealing" through these distributions using MCMC moves that leave \(\pi_k \) invariant, AIS constructs a sequence of states \(\mathbf{x}^{(0)}, \mathbf{x}^{(1)}, \ldots, \mathbf{x}^{(K)}\) and corresponding weights. The overall AIS estimator is, in fact, unbiased even for a finite number of steps. At every step \(k\), we sample \(N\) independent chains, each initialized from \(\pi_0\). After \(K\) steps, we average the product of weights across all chains to obtain an estimate of the normalization constant: \[ \hat Z = Z_0 \cdot \frac{1}{N} \sum_{i=1}^N \prod_{k=1}^K \psi\left(\mathbf{x}_i^{(k-1)}\right)^{\beta_k - \beta_{k-1}} \] where each ratio is computed as the expectation under the intermediate distribution \(\pi_{\beta_{k-1}} \). \[ \frac{Z_{\beta_k}}{Z_{\beta_{k-1}}} = \mathbb{E}_{\mathbf{x} \sim \pi_{\beta_{k-1}}} \left[ \psi \left(\mathbf{x}\right)^{\beta_k - \beta_{k-1}}\right] \] In our implementation, the Markov chain at each annealing step is evolved using Glauber dynamics, which performs single-site updates conditioned on the rest of the configuration. For a randomly chosen edge index \( e \) in the tensor network graph, we resample its value from the conditional distribution \[ x_e \sim \pi_{\beta_k}(x_e \mid x_{\setminus e}) \;\propto\; \prod_{\substack{v \in \mathcal{V} \\ e \text{ incident to } v}} T_v[\ldots, x_e, \ldots]^{\beta_k}. \] This local update rule preserves \(\pi_{\beta_k}\) as the stationary distribution, enabling the Markov chain to gradually adapt to sharper distributions as \(\beta_k \to 1\).

Our test networks include \( 2 \times 2 \) and \( 3 \times 3 \) lattice networks. Each rank-\(n\), \(n\)-dimensional tensor is defined as diagonally-dominant. \[ T_v = \left(1 - \epsilon \right)D + \left( \frac{\epsilon}{d^n} \right)N \] where \[ D(i_1 i_2 \ldots i_n) = \begin{cases} 1 & \text{if } i_1 = i_2 = \cdots = i_n, \\ 0 & \text{otherwise} \end{cases} \quad\text{and}\quad N(i_1 i_2 \ldots i_n) = 1 \] and \( \epsilon \in [0, 1]\) is a noise parameter. This structure closely mimic components of quantum circuits (e.g., \(T\), CZ, and phase gates) and create a challenging, high-concentration probability landscape for MCMC.

We benchmark AIS against naive uniform sampling by tracking the geometric mean relative error (GMRE) of each ratio estimate \(Z_{\beta_k} / Z_{\beta_{k-1}}\) across annealing steps and trials. Results show that while both methods perform comparably when \(\beta_k \approx 0\) (nearly uniform distribution), AIS significantly outperforms uniform sampling as the target distribution sharpens. This improvement arises from AIS's ability to locally mix towards the high-probability configurations, maintaining accuracy even in concentrated regimes.

Overall, this project demonstrates that AIS with Glauber dynamics provides a scalable and accurate tool for approximating tensor network contractions in structured, high-dimensional settings relevant to quantum simulation.

        \begin{algorithm}
        \caption{Annealed Importance Sampling}
        \Require N chains, ladder (\beta_k)
        \State Sample \(x^{(0,i)}\sim \pi_0\) for \(i=1,\dots,N\)
        \For{\(k=1\) to \(K\)}
        \For{\(i=1\) to \(N\)}
            \State Mix chain \(i\) under \(\pi_{\beta_k}\) via MCMC
            \State \(w_k^{(i)} \gets \psi(x^{(k-1,i)})^{\beta_k-\beta_{k-1}}\)
        \EndFor
        \EndFor
        \State \(\hat Z \gets Z_0 \cdot \frac{1}{N} \sum_{i=1}^N \prod_{k=1}^K w_k^{(i)}\)
        \end{algorithm}