<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Smalldiffusion系列 - 系列 - mywebsite</title><link>https://steven-yl.github.io/mywebsite/series/smalldiffusion%E7%B3%BB%E5%88%97/</link><description>Smalldiffusion系列 - 系列 - mywebsite</description><generator>Hugo -- gohugo.io</generator><language>zh-CN</language><managingEditor>steven@gmail.com (Steven)</managingEditor><webMaster>steven@gmail.com (Steven)</webMaster><copyright>This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.</copyright><lastBuildDate>Fri, 27 Mar 2026 10:00:00 +0800</lastBuildDate><atom:link href="https://steven-yl.github.io/mywebsite/series/smalldiffusion%E7%B3%BB%E5%88%97/" rel="self" type="application/rss+xml"/><item><title>smalldiffusion 技术文档索引</title><link>https://steven-yl.github.io/mywebsite/00_index/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0800</pubDate><author><name>Steven</name><uri>https://github.com/steven-yl</uri></author><guid>https://steven-yl.github.io/mywebsite/00_index/</guid><description><![CDATA[<blockquote>
  <p>smalldiffusion 是一个轻量级扩散模型库，用不到 100 行核心代码实现了扩散模型的训练与采样。
本文档对项目进行全面技术解读，从整体架构到每个函数的实现细节。</p>

</blockquote><h2 id="文档结构" class="headerLink">
    <a href="#%e6%96%87%e6%a1%a3%e7%bb%93%e6%9e%84" class="header-mark"></a>文档结构</h2><table>
  <thead>
      <tr>
          <th>文件</th>
          <th>内容</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><a href="01_overview.md" rel="">01_overview.md</a></td>
          <td>项目总览：架构设计、模块关系、扩散模型数学基础</td>
      </tr>
      <tr>
          <td><a href="02_diffusion.md" rel="">02_diffusion.md</a></td>
          <td>核心模块：噪声调度、训练循环、采样算法 (<code>diffusion.py</code>)</td>
      </tr>
      <tr>
          <td><a href="03_data.md" rel="">03_data.md</a></td>
          <td>数据模块：数据集工具、玩具数据集 (<code>data.py</code>)</td>
      </tr>
      <tr>
          <td><a href="04_model_base.md" rel="">04_model_base.md</a></td>
          <td>模型基础：ModelMixin、预测模式修饰器、注意力机制、嵌入层 (<code>model.py</code>)</td>
      </tr>
      <tr>
          <td><a href="05_model_dit.md" rel="">05_model_dit.md</a></td>
          <td>Diffusion Transformer 模型 (<code>model_dit.py</code>)</td>
      </tr>
      <tr>
          <td><a href="06_model_unet.md" rel="">06_model_unet.md</a></td>
          <td>U-Net 模型 (<code>model_unet.py</code>)</td>
      </tr>
      <tr>
          <td><a href="07_examples.md" rel="">07_examples.md</a></td>
          <td>实战示例：从玩具模型到 Stable Diffusion</td>
      </tr>
  </tbody>
</table>
<h2 id="模块依赖关系" class="headerLink">
    <a href="#%e6%a8%a1%e5%9d%97%e4%be%9d%e8%b5%96%e5%85%b3%e7%b3%bb" class="header-mark"></a>模块依赖关系</h2><div class="code-block highlight is-open show-line-numbers  tw-group tw-my-2">
  <div class="
    
    tw-flex 
    tw-flex-row
    tw-flex-1 
    tw-justify-between 
    tw-w-full tw-bg-bgColor-secondary
    ">      
    <button 
      class="
        code-block-button
        tw-mx-2 
        tw-flex
        tw-flex-row
        tw-flex-1"
      aria-hidden="true">
          <div class="group-[.is-open]:tw-rotate-90 tw-transition-[transform] tw-duration-500 tw-ease-in-out print:!tw-hidden tw-w-min tw-h-min tw-my-1 tw-mx-1"><svg class="icon"
    xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><!-- Font Awesome Free 5.15.4 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) --><path d="M285.476 272.971L91.132 467.314c-9.373 9.373-24.569 9.373-33.941 0l-22.667-22.667c-9.357-9.357-9.375-24.522-.04-33.901L188.505 256 34.484 101.255c-9.335-9.379-9.317-24.544.04-33.901l22.667-22.667c9.373-9.373 24.569-9.373 33.941 0L285.475 239.03c9.373 9.372 9.373 24.568.001 33.941z"/></svg></div>
          <p class="tw-select-none !tw-my-1">text</p>]]></description></item><item><title>smalldiffusion 项目总览</title><link>https://steven-yl.github.io/mywebsite/01_overview/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0800</pubDate><author><name>Steven</name><uri>https://github.com/steven-yl</uri></author><guid>https://steven-yl.github.io/mywebsite/01_overview/</guid><description><![CDATA[<h2 id="11-项目定位" class="headerLink">
    <a href="#11-%e9%a1%b9%e7%9b%ae%e5%ae%9a%e4%bd%8d" class="header-mark"></a>1.1 项目定位</h2><p>smalldiffusion 是一个教学与实验导向的扩散模型库，核心训练和采样代码不到 100 行。它的设计目标是：</p>
<ul>
<li>提供可读、可理解的扩散模型实现</li>
<li>支持从 2D 玩具数据到 Stable Diffusion 级别的预训练模型</li>
<li>方便研究者快速实验新的采样算法和模型架构</li>
</ul>
<p>论文参考：<a href="https://arxiv.org/abs/2306.04848" target="_blank" rel="noopener noreferrer">Permenter and Yuan, arXiv:2306.04848</a></p>]]></description></item><item><title>smalldiffusion 核心模块：diffusion.py</title><link>https://steven-yl.github.io/mywebsite/02_diffusion/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0800</pubDate><author><name>Steven</name><uri>https://github.com/steven-yl</uri></author><guid>https://steven-yl.github.io/mywebsite/02_diffusion/</guid><description><![CDATA[<blockquote>
  <p>本文件是 smalldiffusion 的核心，包含噪声调度（Schedule）、训练循环（training_loop）和采样算法（samples），总计不到 100 行代码。</p>

</blockquote><h2 id="21-模块结构" class="headerLink">
    <a href="#21-%e6%a8%a1%e5%9d%97%e7%bb%93%e6%9e%84" class="header-mark"></a>2.1 模块结构</h2><div class="code-block highlight is-closed show-line-numbers  tw-group tw-my-2">
  <div class="
    
    tw-flex 
    tw-flex-row
    tw-flex-1 
    tw-justify-between 
    tw-w-full tw-bg-bgColor-secondary
    ">      
    <button 
      class="
        code-block-button
        tw-mx-2 
        tw-flex
        tw-flex-row
        tw-flex-1"
      aria-hidden="true">
          <div class="group-[.is-open]:tw-rotate-90 tw-transition-[transform] tw-duration-500 tw-ease-in-out print:!tw-hidden tw-w-min tw-h-min tw-my-1 tw-mx-1"><svg class="icon"
    xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><!-- Font Awesome Free 5.15.4 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) --><path d="M285.476 272.971L91.132 467.314c-9.373 9.373-24.569 9.373-33.941 0l-22.667-22.667c-9.357-9.357-9.375-24.522-.04-33.901L188.505 256 34.484 101.255c-9.335-9.379-9.317-24.544.04-33.901l22.667-22.667c9.373-9.373 24.569-9.373 33.941 0L285.475 239.03c9.373 9.372 9.373 24.568.001 33.941z"/></svg></div>
          <p class="tw-select-none !tw-my-1">text</p>]]></description></item><item><title>smalldiffusion 数据模块：data.py</title><link>https://steven-yl.github.io/mywebsite/03_data/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0800</pubDate><author><name>Steven</name><uri>https://github.com/steven-yl</uri></author><guid>https://steven-yl.github.io/mywebsite/03_data/</guid><description><![CDATA[<blockquote>
  <p>本文件提供数据集工具函数和三个 2D 玩具数据集，用于快速验证扩散模型的正确性。</p>

</blockquote><h2 id="31-模块结构" class="headerLink">
    <a href="#31-%e6%a8%a1%e5%9d%97%e7%bb%93%e6%9e%84" class="header-mark"></a>3.1 模块结构</h2><div class="code-block highlight is-open show-line-numbers  tw-group tw-my-2">
  <div class="
    
    tw-flex 
    tw-flex-row
    tw-flex-1 
    tw-justify-between 
    tw-w-full tw-bg-bgColor-secondary
    ">      
    <button 
      class="
        code-block-button
        tw-mx-2 
        tw-flex
        tw-flex-row
        tw-flex-1"
      aria-hidden="true">
          <div class="group-[.is-open]:tw-rotate-90 tw-transition-[transform] tw-duration-500 tw-ease-in-out print:!tw-hidden tw-w-min tw-h-min tw-my-1 tw-mx-1"><svg class="icon"
    xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><!-- Font Awesome Free 5.15.4 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) --><path d="M285.476 272.971L91.132 467.314c-9.373 9.373-24.569 9.373-33.941 0l-22.667-22.667c-9.357-9.357-9.375-24.522-.04-33.901L188.505 256 34.484 101.255c-9.335-9.379-9.317-24.544.04-33.901l22.667-22.667c9.373-9.373 24.569-9.373 33.941 0L285.475 239.03c9.373 9.372 9.373 24.568.001 33.941z"/></svg></div>
          <p class="tw-select-none !tw-my-1">text</p>]]></description></item><item><title>smalldiffusion 模型基础：model.py</title><link>https://steven-yl.github.io/mywebsite/04_model_base/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0800</pubDate><author><name>Steven</name><uri>https://github.com/steven-yl</uri></author><guid>https://steven-yl.github.io/mywebsite/04_model_base/</guid><description><![CDATA[<blockquote>
  <p>本文件定义了所有模型共享的基类、预测模式修饰器、通用组件（注意力、嵌入）、玩具模型和理想去噪器。</p>

</blockquote><h2 id="41-模块结构" class="headerLink">
    <a href="#41-%e6%a8%a1%e5%9d%97%e7%bb%93%e6%9e%84" class="header-mark"></a>4.1 模块结构</h2><div class="code-block highlight is-closed show-line-numbers  tw-group tw-my-2">
  <div class="
    
    tw-flex 
    tw-flex-row
    tw-flex-1 
    tw-justify-between 
    tw-w-full tw-bg-bgColor-secondary
    ">      
    <button 
      class="
        code-block-button
        tw-mx-2 
        tw-flex
        tw-flex-row
        tw-flex-1"
      aria-hidden="true">
          <div class="group-[.is-open]:tw-rotate-90 tw-transition-[transform] tw-duration-500 tw-ease-in-out print:!tw-hidden tw-w-min tw-h-min tw-my-1 tw-mx-1"><svg class="icon"
    xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><!-- Font Awesome Free 5.15.4 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) --><path d="M285.476 272.971L91.132 467.314c-9.373 9.373-24.569 9.373-33.941 0l-22.667-22.667c-9.357-9.357-9.375-24.522-.04-33.901L188.505 256 34.484 101.255c-9.335-9.379-9.317-24.544.04-33.901l22.667-22.667c9.373-9.373 24.569-9.373 33.941 0L285.475 239.03c9.373 9.372 9.373 24.568.001 33.941z"/></svg></div>
          <p class="tw-select-none !tw-my-1">text</p>]]></description></item><item><title>smalldiffusion 模型：model_dit.py</title><link>https://steven-yl.github.io/mywebsite/05_model_dit/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0800</pubDate><author><name>Steven</name><uri>https://github.com/steven-yl</uri></author><guid>https://steven-yl.github.io/mywebsite/05_model_dit/</guid><description><![CDATA[<blockquote>
  <p>本文件实现了 <a href="https://arxiv.org/abs/2212.09748" target="_blank" rel="noopener noreferrer">DiT (Peebles &amp; Xie, 2022)</a> 架构，一种基于 Transformer 的扩散模型。</p>

</blockquote><h2 id="51-模块结构" class="headerLink">
    <a href="#51-%e6%a8%a1%e5%9d%97%e7%bb%93%e6%9e%84" class="header-mark"></a>5.1 模块结构</h2><div class="code-block highlight is-open show-line-numbers  tw-group tw-my-2">
  <div class="
    
    tw-flex 
    tw-flex-row
    tw-flex-1 
    tw-justify-between 
    tw-w-full tw-bg-bgColor-secondary
    ">      
    <button 
      class="
        code-block-button
        tw-mx-2 
        tw-flex
        tw-flex-row
        tw-flex-1"
      aria-hidden="true">
          <div class="group-[.is-open]:tw-rotate-90 tw-transition-[transform] tw-duration-500 tw-ease-in-out print:!tw-hidden tw-w-min tw-h-min tw-my-1 tw-mx-1"><svg class="icon"
    xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><!-- Font Awesome Free 5.15.4 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) --><path d="M285.476 272.971L91.132 467.314c-9.373 9.373-24.569 9.373-33.941 0l-22.667-22.667c-9.357-9.357-9.375-24.522-.04-33.901L188.505 256 34.484 101.255c-9.335-9.379-9.317-24.544.04-33.901l22.667-22.667c9.373-9.373 24.569-9.373 33.941 0L285.475 239.03c9.373 9.372 9.373 24.568.001 33.941z"/></svg></div>
          <p class="tw-select-none !tw-my-1">text</p>]]></description></item><item><title>smalldiffusion 模型：model_unet.py</title><link>https://steven-yl.github.io/mywebsite/06_model_unet/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0800</pubDate><author><name>Steven</name><uri>https://github.com/steven-yl</uri></author><guid>https://steven-yl.github.io/mywebsite/06_model_unet/</guid><description><![CDATA[<blockquote>
  <p>本文件实现了经典的 U-Net 扩散模型架构，改编自 <a href="https://github.com/luping-liu/PNDM" target="_blank" rel="noopener noreferrer">PNDM</a> 和 <a href="https://github.com/ermongroup/ddim" target="_blank" rel="noopener noreferrer">DDIM</a> 的实现。</p>

</blockquote><h2 id="61-模块结构" class="headerLink">
    <a href="#61-%e6%a8%a1%e5%9d%97%e7%bb%93%e6%9e%84" class="header-mark"></a>6.1 模块结构</h2><div class="code-block highlight is-open show-line-numbers  tw-group tw-my-2">
  <div class="
    
    tw-flex 
    tw-flex-row
    tw-flex-1 
    tw-justify-between 
    tw-w-full tw-bg-bgColor-secondary
    ">      
    <button 
      class="
        code-block-button
        tw-mx-2 
        tw-flex
        tw-flex-row
        tw-flex-1"
      aria-hidden="true">
          <div class="group-[.is-open]:tw-rotate-90 tw-transition-[transform] tw-duration-500 tw-ease-in-out print:!tw-hidden tw-w-min tw-h-min tw-my-1 tw-mx-1"><svg class="icon"
    xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><!-- Font Awesome Free 5.15.4 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) --><path d="M285.476 272.971L91.132 467.314c-9.373 9.373-24.569 9.373-33.941 0l-22.667-22.667c-9.357-9.357-9.375-24.522-.04-33.901L188.505 256 34.484 101.255c-9.335-9.379-9.317-24.544.04-33.901l22.667-22.667c9.373-9.373 24.569-9.373 33.941 0L285.475 239.03c9.373 9.372 9.373 24.568.001 33.941z"/></svg></div>
          <p class="tw-select-none !tw-my-1">text</p>]]></description></item><item><title>smalldiffusion 实战示例</title><link>https://steven-yl.github.io/mywebsite/07_examples/</link><pubDate>Fri, 27 Mar 2026 10:00:00 +0800</pubDate><author><name>Steven</name><uri>https://github.com/steven-yl</uri></author><guid>https://steven-yl.github.io/mywebsite/07_examples/</guid><description><![CDATA[<blockquote>
  <p>本章解读项目提供的所有示例，从 2D 玩具模型到 Stable Diffusion 级别的预训练模型。</p>

</blockquote><h2 id="71-示例总览" class="headerLink">
    <a href="#71-%e7%a4%ba%e4%be%8b%e6%80%bb%e8%a7%88" class="header-mark"></a>7.1 示例总览</h2><table>
  <thead>
      <tr>
          <th>示例</th>
          <th>数据</th>
          <th>模型</th>
          <th>调度</th>
          <th>条件</th>
          <th>运行方式</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>toyexample.ipynb</td>
          <td>Swissroll 2D</td>
          <td>TimeInputMLP</td>
          <td>LogLinear</td>
          <td>无</td>
          <td>Jupyter</td>
      </tr>
      <tr>
          <td>cond_tree_model.ipynb</td>
          <td>TreeDataset 2D</td>
          <td>ConditionalMLP</td>
          <td>LogLinear</td>
          <td>类别标签 + CFG</td>
          <td>Jupyter</td>
      </tr>
      <tr>
          <td>fashion_mnist_dit.py</td>
          <td>FashionMNIST 28×28</td>
          <td>DiT</td>
          <td>DDPM</td>
          <td>无</td>
          <td>accelerate launch</td>
      </tr>
      <tr>
          <td>fashion_mnist_dit_cond.py</td>
          <td>FashionMNIST 28×28</td>
          <td>DiT + CondEmbedder</td>
          <td>DDPM</td>
          <td>类别标签 + CFG</td>
          <td>accelerate launch</td>
      </tr>
      <tr>
          <td>fashion_mnist_unet.py</td>
          <td>FashionMNIST 28×28</td>
          <td>Scaled(Unet)</td>
          <td>LogLinear</td>
          <td>无</td>
          <td>accelerate launch</td>
      </tr>
      <tr>
          <td>cifar_unet.py</td>
          <td>CIFAR-10 32×32</td>
          <td>Scaled(Unet)</td>
          <td>Sigmoid(训练)/LogLinear(采样)</td>
          <td>无</td>
          <td>accelerate launch</td>
      </tr>
      <tr>
          <td>diffusers_wrapper.py</td>
          <td>-</td>
          <td>ModelLatentDiffusion</td>
          <td>LDM</td>
          <td>文本</td>
          <td>Python 模块</td>
      </tr>
      <tr>
          <td>stablediffusion.py</td>
          <td>-</td>
          <td>ModelLatentDiffusion</td>
          <td>LDM</td>
          <td>文本</td>
          <td>python</td>
      </tr>
  </tbody>
</table>
<hr>
<h2 id="72-玩具模型示例-toyexampleipynb" class="headerLink">
    <a href="#72-%e7%8e%a9%e5%85%b7%e6%a8%a1%e5%9e%8b%e7%a4%ba%e4%be%8b-toyexampleipynb" class="header-mark"></a>7.2 玩具模型示例 (toyexample.ipynb)</h2><h3 id="最小可运行代码" class="headerLink">
    <a href="#%e6%9c%80%e5%b0%8f%e5%8f%af%e8%bf%90%e8%a1%8c%e4%bb%a3%e7%a0%81" class="header-mark"></a>最小可运行代码</h3><div class="code-block highlight is-closed show-line-numbers  tw-group tw-my-2">
  <div class="
    
    tw-flex 
    tw-flex-row
    tw-flex-1 
    tw-justify-between 
    tw-w-full tw-bg-bgColor-secondary
    ">      
    <button 
      class="
        code-block-button
        tw-mx-2 
        tw-flex
        tw-flex-row
        tw-flex-1"
      aria-hidden="true">
          <div class="group-[.is-open]:tw-rotate-90 tw-transition-[transform] tw-duration-500 tw-ease-in-out print:!tw-hidden tw-w-min tw-h-min tw-my-1 tw-mx-1"><svg class="icon"
    xmlns="http://www.w3.org/2000/svg" viewBox="0 0 320 512"><!-- Font Awesome Free 5.15.4 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license/free (Icons: CC BY 4.0, Fonts: SIL OFL 1.1, Code: MIT License) --><path d="M285.476 272.971L91.132 467.314c-9.373 9.373-24.569 9.373-33.941 0l-22.667-22.667c-9.357-9.357-9.375-24.522-.04-33.901L188.505 256 34.484 101.255c-9.335-9.379-9.317-24.544.04-33.901l22.667-22.667c9.373-9.373 24.569-9.373 33.941 0L285.475 239.03c9.373 9.372 9.373 24.568.001 33.941z"/></svg></div>
          <p class="tw-select-none !tw-my-1">python</p>]]></description></item></channel></rss>