Variational Diffusion Models

0 author

Diederik P. Kingma∗, Tim Salimans∗, Ben Poole, Jonathan Ho

Google Reasearch

1 Introduction

diffusion model已经展现了惊人的图像生成能力,但是之前文章的各类diffusion model在standard image density benchmarks表现不如GAN,本文提出了一类diffusion model 在这上面取得了SOTA

2 Main contributions

  • 提出了新的diffusion model 取得了SOTA 对数似然在standard image density estimation benchmarks(CIFAR-10 and ImageNet)
  • 改进了对variational lower bound(VLB)的理解,根据signal-to-noise推导出了简单的扩散过程。从而发现了连续时间diffusion中的不变性。并且证明了各种diffusion models是等效的,取决于一个平凡的时间依赖的对数据的放缩

image-20230704110830893

[1]: https://arxiv.org/pdf/2107.00630.pdf “Variational diffusion models”

3 Related work

  • 2015 diffusion probabilistic models(DPMs)

  • 2020 DDPM

  • 2020 SMLD

  • our work

  • from VAE perspective —-> likelihood under a continuous-time diffusion model

  • use signal-to-noise ratio —->intuitive and simple loss / invariance

![比较](C:\Users\A\Desktop\大三下\diffusion\variational DM\比较.png)

4 Model

the most basic case of generative modeling

x: a dataset observations

p(x): the task to estimate the marginal distribution p(x)

latent-variable model consists of a diffusion process

simple variational lower bound(VLB) to optimization parameters

![VDM](C:\Users\A\Desktop\大三下\diffusion\variational DM\VDM.png)

4.1 Forward time diffusion process

data x

latent variables $z_t$ : increasingly noisy versions of x

t :from t=0(least noisy) to t=1(most noisy)

$\alpha_t,\sigma_t^2$是关于t的函数

$t\in[0,1],\ \ q(z_t|x)=N(\alpha_t x,\sigma_t^2I)$

signal-to-noise ratio(SNR):$SNR(t)=\alpha_t^2/\sigma_t^2$

monotonically decreasing in time:SNR(t)<SNR(s),for t>s

$z_t随时间增大是逐渐增加的噪声$

之前的工作中DDPM variance-preserving diffusion process $\alpha_t=\sqrt{1-\sigma_t^2}$

和score based generative model variance-exploding diffusion model $\alpha_t=1$

本文使用variation-preserving

4.2 Noise schedule

先前的工作,噪声是固定表达的

我们加入可学习的参数

$\sigma_t^2=sigmoid(\gamma_{\eta}(t))$ 3 linear layers

use $\alpha_t=\sqrt{1-\sigma_{t}^2}$

$\alpha_t^2=sigmoid(-\gamma_{\eta}(t))$

$SNR(t)=exp(-\gamma_{\eta(t)})$

4.3 Reverse time generative model

yield a hierarchical generative model that samples a sequence of latent $z_t$

从VAE的角度推导

finite T timesteps

$s(i)=(i-1)/T \ and \ t(i)=i/T$

$p(x)=\int_z p(z_1)p(x|z_0)\Pi_{i=1}^T p(z_{s(i)}|z_{t(i)})$

  1. sufficiently small SNR(1) $\rightarrow$ $q(z_1|x)\approx N(z_1;0,I)$ $\rightarrow$ $p(z_1)=N(z_1;0,I)$

  2. sufficiently large SNR(0) $\rightarrow$ $p(x|z_0)\propto q(z_0|x)$

4.4 Noise prediction model

$\hat{x}{\theta}(z_t,t)$ denoising model in terms of noise prediction model $\hat{\epsilon}{\theta}(z_t,t)$

$\hat{x}{\theta}(z_t,t)=(z_t-\sigma_t\hat{\epsilon}{\theta}(z_t,t))/\alpha_t$

4.4.1 attention! Fourier features

prior work emphasizes coarse scale patterns and global consistency of generated images

our work optimize for likelihood, which is sensitive to fine scale details and exact values of pixels

add a set of Fourier features to the input of the noise prediction model

4.5 loss

VLB

![vlb](C:\Users\A\Desktop\大三下\diffusion\variational DM\vlb.png)

5 Discrete-time model

finite T ,using s(i)=(i-1)/T,t(i)=i/T

diffusion loss:

![loss1](C:\Users\A\Desktop\大三下\diffusion\variational DM\loss1.png)

simplifies:

![loss2](C:\Users\A\Desktop\大三下\diffusion\variational DM\loss2.png)

simplifies:

![loss3](C:\Users\A\Desktop\大三下\diffusion\variational DM\loss3.png)

5.1 More steps leads to lower loss

T, 2T

when keeping SNR function fixed,$\hat{x}_\theta$ is sufficiently good

$L_{2T}(x)<L_{T}(x)$

![lower los](C:\Users\A\Desktop\大三下\diffusion\variational DM\lower los.png)

6 Continuous-time model: $T\rightarrow \infin$

$T\rightarrow \infin$

diffusion loss:

![inf](C:\Users\A\Desktop\大三下\diffusion\variational DM\inf.png)

simplifies:

![inf2](C:\Users\A\Desktop\大三下\diffusion\variational DM\inf2.png)

6.1 Equivalence of diffusion models in continuous time

SNR(t) invertible due to the monotonicity assumption

a change of variables : v=SNR(t)

diffusion loss:

![inf3](C:\Users\A\Desktop\大三下\diffusion\variational DM\inf3.png)

this equation shows us is that the only effect the functions $\alpha(t),\sigma(t)$ have on the diffusion loss is through the values SNR(t) at the endpoints t=1,t=0

The VLB is thus only impacted by the function SNR(t) through its endpoints SNRmin and SNRmax.

Futhermore, any two diffusion model can be seen equivalent in continuous time.

7 Experiments

CIFAR-10 and ImageNet

establish a new state-of-the-art in terms of test set likelihood on all considered benchmarks

![exp](C:\Users\A\Desktop\大三下\diffusion\variational DM\exp.png)

7.1 Ablations


Variational Diffusion Models
http://gjc2.github.io/2023/07/04/2023-07-04-Variational-Diffusion-Models/
作者
gjc
发布于
2023年7月4日
许可协议