The problems to be Solved
- Problem 1: to generate samples $\{\textbf x^{(r)}\}_{r=1}^R$ from a given probability distribution $P(\textbf x)$.
- Problem 2: to estimate expectations of functions under this distribution, for example,
$$\boldsymbol{\Phi} = \int \phi(\textbf x)P(\textbf x)\mathrm{d}^N \textbf x$$
Uniform Sampling
Drawing random samples $\{\textbf x^{(r)}\}_{r=1}^R$ uniformly from the state space and evaluating $\mathbb P^*(\textbf x)$ at those points. Defined $Z_R$ as,
$$Z_R = \sum_{r=1}^R P^*(\textbf x)$$
And estimate $\boldsymbol{\Phi} = \int \phi(\textbf x)P(\textbf x)\mathrm{d}^N \textbf x$ by,
$$\hat{\boldsymbol{\Phi}} = \sum_{r=1}^R\phi(\textbf x^{(r)}\frac{P^*(\textbf x^{(r)})}{Z_R}.$$
Typical Set: A high-dimensional distribution is often concentrated in a small region of the state space knwon as its typical set $T$, whose volume is given by $|T|\simeq 2^{H(\textbf X)}$, where $H(\textbf X)$ is the Shannon-Gibbs entropy,
$$H(\textbf X) = \sum_{\textbf X}P(\textbf x)\log_2\frac{1}{P(\textbf x)}$$
若以只有两个状态点的Ising Model举例的话,一般来说我们只需要抽样$R_{\min}$个点就可以很好的估计了,
$$R_{\min}\simeq 2^{N-H} = 2^{N – N/2} = 2^{N/2}$$
$N$为维数。
但由于Sampling次数还是太大,我们还是进行计算,而且在高维的情况下,如果$\mathbb{P}(\textbf x)$不是Uniform,那么也很难利用这个方法来Sampling。因此提出了其他方法来解决,例如Importance Sampling, Rejection Sampling, Metropolis Method, Gibbs Sampling etc.
Importance Sampling
Importance Sampling 主要解决第二个问题。
Find a simpler density $Q(x)$, and its $Q^*(x)$ is easy to generate.
Introduce the ‘weights’,
$$w_r = \frac{P^*(x^{(r)})}{Q^*(x^{(r)})}$$
which we use to adjust the “importance” of each point in our estimator thus:
$$\hat{\boldsymbol{\Phi}} = \frac{\sum_rw_r\phi(x^{(r)})}{\sum_rw_r}.$$
If $Q(x)$ is non-zero for all $x$ where $P(x)$ is non-zero, it can be proved that $\hat{\boldsymbol{\Phi}}$ converges to $\boldsymbol{\Phi}$ as $R$ increases.
Disadvantages: Fisrt, we clearly need to obtain samples in the typical set of $P$. Second, even if we obtain samples in the typical set, the weights associated with those samples are likely to vary by large factors, because the probabilities of points in a typical set, although similar to each other, still differ by factors of order $\exp(\sqrt{N})$.
Rejection Sampling
- Draw $x\sim Q$,$Q$ is the proposal distribution.
- Compute $$\alpha = \frac{p^*(x)}{cq^*(x)}$$
- Draw $U\sim\text{Unif}(0,1)$
- Accept $x$ if $u\leq\alpha$
Disadvantages: Not efficiency.