目录

DDPM原理详解

前向过程将数据 x(0)pdatax^{(0)} \sim p_\text{data} 逐步加噪,得到 x(1),,x(T)x^{(1)}, \ldots, x^{(T)},最终 x(T)x^{(T)} 近似标准高斯。下面给出定义与单步转移多步边际 q(x(t)x(0))q(x^{(t)} \mid x^{(0)}) 的闭式推导,以及重参数化形式。


  • 前向过程是马尔可夫链: x(0)x(1)x(T).x^{(0)} \rightarrow x^{(1)} \rightarrow \cdots \rightarrow x^{(T)}.
  • 固定方差序列 β1,,βT(0,1)\beta_1, \ldots, \beta_T \in (0,1),令 αt=1βt,αˉt=s=1tαs.\alpha_t = 1 - \beta_t, \qquad \bar\alpha_t = \prod_{s=1}^{t} \alpha_s. (约定 αˉ0=1\bar\alpha_0 = 1。)

前向的单步转移取为均值缩小、方差固定的高斯:

q(x(t)x(t1))=N(x(t); 1βtx(t1), βtI)=N(x(t); αtx(t1), βtI). q(x^{(t)} \mid x^{(t-1)}) = \mathcal{N}\big(x^{(t)};\ \sqrt{1-\beta_t}\, x^{(t-1)},\ \beta_t \mathbf{I}\big) = \mathcal{N}\big(x^{(t)};\ \sqrt{\alpha_t}\, x^{(t-1)},\ \beta_t \mathbf{I}\big).

等价地,可写成重参数化形式(便于采样与推导):

x(t)=αtx(t1)+βtεt,εtN(0,I), i.i.d. x^{(t)} = \sqrt{\alpha_t}\, x^{(t-1)} + \sqrt{\beta_t}\, \varepsilon_{t}, \qquad \varepsilon_t \sim \mathcal{N}(0, \mathbf{I}),\ \text{i.i.d.}

我们希望对中间步积分,得到x(0)x^{(0)} 一步到 x(t)x^{(t)} 的分布 q(x(t)x(0))q(x^{(t)} \mid x^{(0)}),并证明它仍是单高斯有闭式

由单步形式反复代入:

x(1)=α1x(0)+β1ε1,x(2)=α2x(1)+β2ε2=α2α1x(0)+α2β1ε1+β2ε2, \begin{aligned} x^{(1)} &= \sqrt{\alpha_1}\, x^{(0)} + \sqrt{\beta_1}\, \varepsilon_1, \\ x^{(2)} &= \sqrt{\alpha_2}\, x^{(1)} + \sqrt{\beta_2}\, \varepsilon_2 = \sqrt{\alpha_2\alpha_1}\, x^{(0)} + \sqrt{\alpha_2\beta_1}\, \varepsilon_1 + \sqrt{\beta_2}\, \varepsilon_2, \\ &\vdots \end{aligned}

一般地,x(t)x^{(t)} 可写成 x(0)x^{(0)}ε1,,εt\varepsilon_1,\ldots,\varepsilon_t 的线性组合。由于各 εs\varepsilon_s 独立且与 x(0)x^{(0)} 独立,该线性组合仍为高斯,只需求其均值与方差。下面推导中会自然出现 αˉt=s=1tαs\bar\alpha_t = \prod_{s=1}^{t}\alpha_s

x(t)x^{(t)}x(0)x^{(0)} 的系数为 ctc_t。由递推:

  • x(1)=α1x(0)+x^{(1)} = \sqrt{\alpha_1}\, x^{(0)} + \cdots,故 c1=α1c_1 = \sqrt{\alpha_1}
  • x(t)=αtx(t1)+βtεtx^{(t)} = \sqrt{\alpha_t}\, x^{(t-1)} + \sqrt{\beta_t}\, \varepsilon_t,若 x(t1)x^{(t-1)}x(0)x^{(0)} 的系数为 ct1c_{t-1},则 x(t)x^{(t)}x(0)x^{(0)} 的系数为 ct=αtct1c_t = \sqrt{\alpha_t}\, c_{t-1}

因此

ct=αtct1=αtαt1ct2==αtα1=s=1tαs=αˉt. c_t = \sqrt{\alpha_t}\, c_{t-1} = \sqrt{\alpha_t\,\alpha_{t-1}}\, c_{t-2} = \cdots = \sqrt{\alpha_t \cdots \alpha_1} = \sqrt{\prod_{s=1}^{t}\alpha_s} = \sqrt{\bar\alpha_t}.

E[εs]=0\mathbb{E}[\varepsilon_s]=0,故

E[x(t)x(0)]=αˉtx(0). \mathbb{E}[x^{(t)} \mid x^{(0)}] = \sqrt{\bar\alpha_t}\, x^{(0)}.

vt=Var(x(t)x(0))v_t = \mathrm{Var}(x^{(t)} \mid x^{(0)})(标量方差,各维度独立且相同)。由 x(t)=αtx(t1)+βtεtx^{(t)} = \sqrt{\alpha_t}\, x^{(t-1)} + \sqrt{\beta_t}\, \varepsilon_t,且 x(t1)x^{(t-1)}εt\varepsilon_t 在给定 x(0)x^{(0)} 下独立,故

vt=αtvt1+βt. v_t = \alpha_t\, v_{t-1} + \beta_t.

利用 βt=1αt\beta_t = 1 - \alpha_t,代入得

vt=αtvt1+(1αt). v_t = \alpha_t\, v_{t-1} + (1 - \alpha_t).

递推初值x(0)x^{(0)} 给定无随机性,v0=0v_0 = 0。可验证 v1=β1=1α1=1αˉ1v_1 = \beta_1 = 1 - \alpha_1 = 1 - \bar\alpha_1

归纳:设 vt1=1αˉt1v_{t-1} = 1 - \bar\alpha_{t-1},则

vt=αt(1αˉt1)+(1αt)=αtαtαˉt1+1αt=1αtαˉt1=1αˉt. v_t = \alpha_t\, (1 - \bar\alpha_{t-1}) + (1 - \alpha_t) = \alpha_t - \alpha_t\bar\alpha_{t-1} + 1 - \alpha_t = 1 - \alpha_t\bar\alpha_{t-1} = 1 - \bar\alpha_t.

因此

Var(x(t)x(0))=(1αˉt)I. \mathrm{Var}(x^{(t)} \mid x^{(0)}) = (1 - \bar\alpha_t)\, \mathbf{I}.

于是:

q(x(t)x(0))=N(x(t); αˉtx(0), (1αˉt)I). \boxed{ q(x^{(t)} \mid x^{(0)}) = \mathcal{N}\big(x^{(t)};\ \sqrt{\bar\alpha_t}\, x^{(0)},\ (1-\bar\alpha_t)\,\mathbf{I}\big). }

即:给定 x(0)x^{(0)} 时,x(t)x^{(t)} 是单高斯,均值 αˉtx(0)\sqrt{\bar\alpha_t}\, x^{(0)},方差 (1αˉt)I(1-\bar\alpha_t)\mathbf{I},与中间步无关,有闭式、可采样、可求密度


x(t)x^{(t)} 写成仅依赖 x(0)x^{(0)} 与一个标准高斯噪声 ϵ\epsilon 的形式,便于实现采样与后续对 ϵ\epsilon 的回归:

x(t)=αˉtx(0)+1αˉtϵ,ϵN(0,I). x^{(t)} = \sqrt{\bar\alpha_t}\, x^{(0)} + \sqrt{1-\bar\alpha_t}\, \epsilon, \qquad \epsilon \sim \mathcal{N}(0, \mathbf{I}).

等价性:右边均值为 αˉtx(0)\sqrt{\bar\alpha_t}\, x^{(0)},方差为 (1αˉt)I(1-\bar\alpha_t)\mathbf{I},与 q(x(t)x(0))q(x^{(t)} \mid x^{(0)}) 一致;且单步加噪多步一次加噪在分布上等价(给定 x(0)x^{(0)}),因此训练时可对 (x(0),t)(x^{(0)}, t) 随机采样,再按上式生成 x(t)x^{(t)},让网络预测对应的 ϵ\epsilon(即 ϵθ(x(t),t)\epsilon_\theta(x^{(t)}, t))。


形式
单步转移q(x(t)x(t1))=N(αtx(t1), βtI)q(x^{(t)} \mid x^{(t-1)}) = \mathcal{N}(\sqrt{\alpha_t}\, x^{(t-1)},\ \beta_t \mathbf{I})
多步边际q(x(t)x(0))=N(αˉtx(0), (1αˉt)I)q(x^{(t)} \mid x^{(0)}) = \mathcal{N}(\sqrt{\bar\alpha_t}\, x^{(0)},\ (1-\bar\alpha_t)\mathbf{I})
重参数化x(t)=αˉtx(0)+1αˉtϵ, ϵN(0,I)x^{(t)} = \sqrt{\bar\alpha_t}\, x^{(0)} + \sqrt{1-\bar\alpha_t}\, \epsilon,\ \epsilon\sim\mathcal{N}(0,\mathbf{I})
  • αˉt\bar\alpha_ttt 增大而减小,故 αˉt\sqrt{\bar\alpha_t} 变小、1αˉt\sqrt{1-\bar\alpha_t} 变大,x(t)x^{(t)} 中噪声占比增加;当 t=Tt=TαˉT0\bar\alpha_T \approx 0 时,x(T)x^{(T)} 近似 N(0,I)\mathcal{N}(0,\mathbf{I})
  • 前向过程不包含可学习参数;反向过程才用神经网络拟合 q(x(t1)x(t),x(0))q(x^{(t-1)} \mid x^{(t)}, x^{(0)}) 的近似 pθ(x(t1)x(t))p_\theta(x^{(t-1)} \mid x^{(t)})

反向过程从 x(T)N(0,I)x^{(T)} \sim \mathcal{N}(0, \mathbf{I}) 出发,逐步采样 x(T1),,x(0)x^{(T-1)}, \ldots, x^{(0)},得到生成样本。目标是用神经网络拟合反向转移 pθ(x(t1)x(t))p_\theta(x^{(t-1)} \mid x^{(t)})。由于不给定 x(0)x^{(0)} 时真实反向 q(x(t1)x(t))q(x^{(t-1)} \mid x^{(t)}) 不可解析,我们利用给定 x(0)x^{(0)} 时可解析的后验 q(x(t1)x(t),x(0))q(x^{(t-1)} \mid x^{(t)}, x^{(0)}) 做推导与训练,再以 ϵθ\epsilon_\theta 参数化均值,得到最终的反向采样公式。

记号与前向一致:αt=1βt\alpha_t = 1 - \beta_tαˉt=s=1tαs\bar\alpha_t = \prod_{s=1}^{t}\alpha_s


在给定 x(t)x^{(t)}x(0)x^{(0)} 时,由贝叶斯公式(前向转移用 qq 表示):

q(x(t1)x(t),x(0))=q(x(t)x(t1))q(x(t1)x(0))q(x(t)x(0)). q(x^{(t-1)} \mid x^{(t)}, x^{(0)}) = \frac{q(x^{(t)} \mid x^{(t-1)})\, q(x^{(t-1)} \mid x^{(0)})}{q(x^{(t)} \mid x^{(0)})}.

三项均为前向过程的高斯,有闭式:

  • q(x(t)x(t1))=N(x(t);αtx(t1),βtI)q(x^{(t)} \mid x^{(t-1)}) = \mathcal{N}(x^{(t)}; \sqrt{\alpha_t}\, x^{(t-1)}, \beta_t \mathbf{I})
  • q(x(t1)x(0))=N(x(t1);αˉt1x(0),(1αˉt1)I)q(x^{(t-1)} \mid x^{(0)}) = \mathcal{N}(x^{(t-1)}; \sqrt{\bar\alpha_{t-1}}\, x^{(0)}, (1-\bar\alpha_{t-1})\mathbf{I})
  • q(x(t)x(0))=N(x(t);αˉtx(0),(1αˉt)I)q(x^{(t)} \mid x^{(0)}) = \mathcal{N}(x^{(t)}; \sqrt{\bar\alpha_t}\, x^{(0)}, (1-\bar\alpha_t)\mathbf{I})

因此上式右边可算出,且后验仍为高斯(高斯的条件仍为高斯)。下面推导其均值 μ~t\tilde\mu_t 与方差 β~t\tilde\beta_t


q(x(t1)x(t),x(0))=N(x(t1);μ~t(x(t),x(0)),β~tI). q(x^{(t-1)} \mid x^{(t)}, x^{(0)}) = \mathcal{N}(x^{(t-1)}; \tilde\mu_t(x^{(t)}, x^{(0)}), \tilde\beta_t \mathbf{I}).

对高斯密度取对数、只保留与 x(t1)x^{(t-1)} 有关的项(其余并入常数),有

logq(x(t1)x(t),x(0))=12βtx(t)αtx(t1)212(1αˉt1)x(t1)αˉt1x(0)2+const. \log q(x^{(t-1)} \mid x^{(t)}, x^{(0)}) = -\frac{1}{2\beta_t}\big\| x^{(t)} - \sqrt{\alpha_t}\, x^{(t-1)} \big\|^2 - \frac{1}{2(1-\bar\alpha_{t-1})}\big\| x^{(t-1)} - \sqrt{\bar\alpha_{t-1}}\, x^{(0)} \big\|^2 + \text{const}.

这是 x(t1)x^{(t-1)} 的二次型,故后验为高斯。展开并合并 x(t1)x^{(t-1)} 的二次项与一次项即可得到 β~t\tilde\beta_tμ~t\tilde\mu_t

x(t1)x^{(t-1)} 的二次项系数为

αt2βt+12(1αˉt1)=αt(1αˉt1)+βt2βt(1αˉt1). \frac{\alpha_t}{2\beta_t} + \frac{1}{2(1-\bar\alpha_{t-1})} = \frac{\alpha_t(1-\bar\alpha_{t-1}) + \beta_t}{2\beta_t(1-\bar\alpha_{t-1})}.

后验方差满足 1/β~t=αt/βt+1/(1αˉt1)1/\tilde\beta_t = \alpha_t/\beta_t + 1/(1-\bar\alpha_{t-1}),故

β~t=βt(1αˉt1)αt(1αˉt1)+βt. \tilde\beta_t = \frac{\beta_t(1-\bar\alpha_{t-1})}{\alpha_t(1-\bar\alpha_{t-1}) + \beta_t}.

利用 αt=1βt\alpha_t = 1 - \beta_t,分母为

αt(1αˉt1)+βt=(1βt)(1αˉt1)+βt=(1αˉt1)βt(1αˉt1)+βt=1αˉt. \alpha_t(1-\bar\alpha_{t-1}) + \beta_t = (1-\beta_t)(1-\bar\alpha_{t-1}) + \beta_t = (1-\bar\alpha_{t-1}) - \beta_t(1-\bar\alpha_{t-1}) + \beta_t = 1 - \bar\alpha_t.

因此

β~t=βt(1αˉt1)1αˉt. \boxed{\tilde\beta_t = \frac{\beta_t(1-\bar\alpha_{t-1})}{1 - \bar\alpha_t}.}

由二次型配方法或直接写高斯条件均值,可得

μ~t(x(t),x(0))=αˉt1βt1αˉtx(0)+αt(1αˉt1)1αˉtx(t). \tilde\mu_t(x^{(t)}, x^{(0)}) = \frac{\sqrt{\bar\alpha_{t-1}}\,\beta_t}{1-\bar\alpha_t}\, x^{(0)} + \frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})}{1-\bar\alpha_t}\, x^{(t)}.

μ~t=11αˉt(αˉt1βtx(0)+αt(1αˉt1)x(t)). \boxed{\tilde\mu_t = \frac{1}{1-\bar\alpha_t}\Big( \sqrt{\bar\alpha_{t-1}}\,\beta_t\, x^{(0)} + \sqrt{\alpha_t}(1-\bar\alpha_{t-1})\, x^{(t)} \Big).}

前向重参数化有 x(t)=αˉtx(0)+1αˉtϵx^{(t)} = \sqrt{\bar\alpha_t}\, x^{(0)} + \sqrt{1-\bar\alpha_t}\,\epsilon,故

x(0)=x(t)1αˉtϵαˉt. x^{(0)} = \frac{x^{(t)} - \sqrt{1-\bar\alpha_t}\,\epsilon}{\sqrt{\bar\alpha_t}}.

代入 μ~t\tilde\mu_t 的表达式,将 x(0)x^{(0)}x(t)x^{(t)}ϵ\epsilon 替换,可化简为仅含 x(t)x^{(t)}ϵ\epsilon 的形式(推导见下),得到

μ~t=1αt(x(t)βt1αˉtϵ). \tilde\mu_t = \frac{1}{\sqrt{\alpha_t}}\left( x^{(t)} - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\,\epsilon \right).

化简步骤:将 x(0)=(x(t)1αˉtϵ)/αˉtx^{(0)} = (x^{(t)} - \sqrt{1-\bar\alpha_t}\,\epsilon)/\sqrt{\bar\alpha_t} 代入

μ~t=αˉt1βt1αˉtx(0)+αt(1αˉt1)1αˉtx(t), \tilde\mu_t = \frac{\sqrt{\bar\alpha_{t-1}}\,\beta_t}{1-\bar\alpha_t}\, x^{(0)} + \frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})}{1-\bar\alpha_t}\, x^{(t)},

第一项变为

αˉt1βt(1αˉt)αˉt(x(t)1αˉtϵ). \frac{\sqrt{\bar\alpha_{t-1}}\,\beta_t}{(1-\bar\alpha_t)\sqrt{\bar\alpha_t}}\big( x^{(t)} - \sqrt{1-\bar\alpha_t}\,\epsilon \big).

利用 αˉt=αtαˉt1\bar\alpha_t = \alpha_t \bar\alpha_{t-1}αˉt1/αˉt=1/αt\sqrt{\bar\alpha_{t-1}}/\sqrt{\bar\alpha_t} = 1/\sqrt{\alpha_t},故第一项为

βtαt(1αˉt)x(t)βtαt1αˉtϵ. \frac{\beta_t}{\sqrt{\alpha_t}(1-\bar\alpha_t)}\, x^{(t)} - \frac{\beta_t}{\sqrt{\alpha_t}\sqrt{1-\bar\alpha_t}}\,\epsilon.

第二项为 αt(1αˉt1)/(1αˉt)x(t)\sqrt{\alpha_t}(1-\bar\alpha_{t-1})/(1-\bar\alpha_t)\, x^{(t)}。两者相加,x(t)x^{(t)} 的系数为

βt+αt(1αˉt1)αt(1αˉt)=1αˉtαt(1αˉt)=1αt, \frac{\beta_t + \alpha_t(1-\bar\alpha_{t-1})}{\sqrt{\alpha_t}(1-\bar\alpha_t)} = \frac{1-\bar\alpha_t}{\sqrt{\alpha_t}(1-\bar\alpha_t)} = \frac{1}{\sqrt{\alpha_t}},

因此

μ~t=1αtx(t)βtαt1αˉtϵ=1αt(x(t)βt1αˉtϵ). \tilde\mu_t = \frac{1}{\sqrt{\alpha_t}}\, x^{(t)} - \frac{\beta_t}{\sqrt{\alpha_t}\sqrt{1-\bar\alpha_t}}\,\epsilon = \frac{1}{\sqrt{\alpha_t}}\left( x^{(t)} - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\,\epsilon \right).

参数化:采样时没有 ϵ\epsilonx(0)x^{(0)},用神经网络 ϵθ(x(t),t)\epsilon_\theta(x^{(t)}, t) 预测噪声,得到可用的均值

μθ(x(t),t)=1αt(x(t)βt1αˉtϵθ(x(t),t)). \mu_\theta(x^{(t)}, t) = \frac{1}{\sqrt{\alpha_t}}\left( x^{(t)} - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\,\epsilon_\theta(x^{(t)}, t) \right).

模型反向转移(DDPM 中方差取固定 β~t\tilde\beta_t,不学习):

pθ(x(t1)x(t))=N(x(t1); μθ(x(t),t), β~tI), p_\theta(x^{(t-1)} \mid x^{(t)}) = \mathcal{N}\big(x^{(t-1)};\ \mu_\theta(x^{(t)}, t),\ \tilde\beta_t \mathbf{I}\big),

其中

μθ(x(t),t)=1αt(x(t)βt1αˉtϵθ(x(t),t)),β~t=βt(1αˉt1)1αˉt. \mu_\theta(x^{(t)}, t) = \frac{1}{\sqrt{\alpha_t}}\left( x^{(t)} - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\,\epsilon_\theta(x^{(t)}, t) \right), \qquad \tilde\beta_t = \frac{\beta_t(1-\bar\alpha_{t-1})}{1-\bar\alpha_t}.

采样:从 x(T)N(0,I)x^{(T)} \sim \mathcal{N}(0, \mathbf{I}) 开始,对 t=T,T1,,1t = T, T-1, \ldots, 1 采样

x(t1)=μθ(x(t),t)+β~tζ,ζN(0,I). x^{(t-1)} = \mu_\theta(x^{(t)}, t) + \sqrt{\tilde\beta_t}\,\zeta, \qquad \zeta \sim \mathcal{N}(0, \mathbf{I}).

训练目标:在给定 x(0)x^{(0)}tt 时,按前向采样 x(t)=αˉtx(0)+1αˉtϵx^{(t)} = \sqrt{\bar\alpha_t}\, x^{(0)} + \sqrt{1-\bar\alpha_t}\,\epsilon,令网络 ϵθ(x(t),t)\epsilon_\theta(x^{(t)}, t) 预测 ϵ\epsilon,最小化例如 ϵϵθ(x(t),t)2\|\epsilon - \epsilon_\theta(x^{(t)}, t)\|^2(或加权 MSE),等价于拟合 q(x(t1)x(t),x(0))q(x^{(t-1)} \mid x^{(t)}, x^{(0)}) 的均值。


公式
后验方差β~t=βt(1αˉt1)1αˉt\tilde\beta_t = \dfrac{\beta_t(1-\bar\alpha_{t-1})}{1-\bar\alpha_t}
后验均值(含 x(0)x^{(0)}μ~t=αˉt1βt1αˉtx(0)+αt(1αˉt1)1αˉtx(t)\tilde\mu_t = \dfrac{\sqrt{\bar\alpha_{t-1}}\,\beta_t}{1-\bar\alpha_t}\, x^{(0)} + \dfrac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})}{1-\bar\alpha_t}\, x^{(t)}
后验均值(含 ϵ\epsilonμ~t=1αt(x(t)βt1αˉtϵ)\tilde\mu_t = \dfrac{1}{\sqrt{\alpha_t}}\left( x^{(t)} - \dfrac{\beta_t}{\sqrt{1-\bar\alpha_t}}\,\epsilon \right)
模型均值μθ(x(t),t)=1αt(x(t)βt1αˉtϵθ(x(t),t))\mu_\theta(x^{(t)}, t) = \dfrac{1}{\sqrt{\alpha_t}}\left( x^{(t)} - \dfrac{\beta_t}{\sqrt{1-\bar\alpha_t}}\,\epsilon_\theta(x^{(t)}, t) \right)
反向采样x(t1)=μθ(x(t),t)+β~tζ, ζN(0,I)x^{(t-1)} = \mu_\theta(x^{(t)}, t) + \sqrt{\tilde\beta_t}\,\zeta,\ \zeta\sim\mathcal{N}(0,\mathbf{I})

推导链条:贝叶斯后验高斯闭式 μ~t,β~t\tilde\mu_t,\, \tilde\beta_tx(t),ϵx^{(t)},\epsilon 表出 μ~t\tilde\mu_tϵθ\epsilon_\theta 替代 ϵ\epsilon得到 pθ(x(t1)x(t))p_\theta(x^{(t-1)}\mid x^{(t)}) 与采样式


μ~t\tilde\mu_t(以及 μθ\mu_\theta)只是反向条件分布的均值,不是最终的 x(t1)x^{(t-1)} 本身。 真正采样时是从高斯里抽一个样本,即 均值 + 标准差×标准正态:

x(t1)=μθ(x(t),t)+β~tζ,ζN(0,I). x^{(t-1)} = \mu_\theta(x^{(t)}, t) + \sqrt{\tilde\beta_t}\,\zeta, \qquad \zeta \sim \mathcal{N}(0, \mathbf{I}).

这里的 +β~tζ+\sqrt{\tilde\beta_t}\,\zeta 就是“后面加的噪声”。 所以: 公式 μ~t=1αt(x(t)βt1αˉtϵ)\tilde\mu_t = \frac{1}{\sqrt{\alpha_t}}\big( x^{(t)} - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}}\,\epsilon \big) 给出的是分布的均值; 实际更新是:先算均值 μθ\mu_\theta,再加上 β~tζ\sqrt{\tilde\beta_t}\,\zeta 得到 x(t1)x^{(t-1)}。 第 4 节「采样」和第 5 节小结表里已经写了带 β~tζ\sqrt{\tilde\beta_t}\,\zeta 的采样式;均值公式和采样式是配套的:前者定义均值,后者在均值基础上加噪声完成一步采样。

相关内容