paper-reading2
paper-reading
Generative Modeling by Estimating Gradients of the Data Distribution
论文地址 https://arxiv.org/abs/1907.05600
background
- likelihood-based methods
- approach: uses log-likelihood (or a suitable
surrogate) as the training objective.
- intrinsic limitations: either have to use
specialized architectures to build a normalized probability model (e.g.,
autoregressive models, flow models), or use surrogate losses (e.g., the
evidence lower bound used in variational auto-encoders , contrastive
divergence in energy-based models) for training.
- approach: uses log-likelihood (or a suitable
surrogate) as the training objective.
- generative adversarial networks
- approach: uses adversarial training to minimize
f-divergences or integral probability metrics between model and data
distributions.
- intrinsic limitations: their training can be unstable due to the adversarial training procedure.In addition, the GAN objective is not suitable for evaluating and comparing different GAN models.
- approach: uses adversarial training to minimize
f-divergences or integral probability metrics between model and data
distributions.
two main challenges with new approach
- if the data distribution is supported on a low dimensional manifold—as it is often assumed for many real world datasets—the score will be undefined in the ambient space, and score matching will fail to provide a consistent score estimator.
- the scarcity of training data in low data density regions, e.g., far from the manifold, hinders the accuracy of score estimation and slows down the mixing of Langevin dynamics sampling.
Our sampling strategy is inspired by simulated annealing [30, 37] which heuristically improves optimization for multimodal landscapes.
score-matching
优化目标为
令
则
有参数的为估计分布,无参数的为真实分布
Explicit Score Matching
Implicit Score Matching
Theorem1
Proof
只需证明
考虑第 i 维,只需证
而
证毕
这个定理把不可达的
Denoising Score Matching
通过条件分布,将原本不可解的目标变成可解的了
Theorem2
则有
表示两个是相同的优化目标,或者说
从这个定理可以看出,DSM相当于是在加噪图片的空间中进行ESM的优化。
只有在噪声的强度足够小时,才能认为最终的梯度为真实样本空间上的梯度。
Proof
将
Sampling Algorithm
加噪的好处:
- since the support of our Gaussian noise distribution is the whole space, the perturbed data will not be confined to a low dimensional manifold, which obviates difficulties from the manifold hypothesis and makes score estimation well-defined.
- large Gaussian noise has the effect of filling low density regions in the original unperturbed data distribution; therefore score matching may get more training signal to improve score estimation.
- by using multiple noise levels we can obtain a sequence of noise-perturbed distributions that converge to the true data distribution. We can improve the mixing rate of Langevin dynamics on multimodal distributions by leveraging these intermediate distributions in the spirit of simulated annealing and annealed importance sampling
Loss Function
选择
所以令
Inference Algorithm : Annealed Langevin Dynamics
选择步长大小的初衷是保持信噪比为常数。