输入“/”快速插入内容

The geometry of diffusion guidance 扩散引导的几何

2024年4月3日修改
引导是一种强大的方法,可用于增强扩散模型采样。正如我在早期的博客文章中讨论的,它几乎像一个作弊码:它可以大幅提升样本质量,好像模型的参数数量增加了十倍——基本上是免费获得的数量级改进!这篇后续文章提供了对扩散采样过程的几何解释和可视化,我发现这对解释引导是如何工作的特别有用。
Guidance is a powerful method that can be used to enhance diffusion model sampling. As I’ve discussed in an earlier blog post, it’s almost like a cheat code: it can improve sample quality so much that it’s as if the model had ten times the number of parameters – an order of magnitude improvement, basically for free! This follow-up post provides a geometric interpretation and visualisation of the diffusion sampling procedure, which I’ve found particularly useful to explain how guidance works.
1.A word of warning about high-dimensional spaces 对于高维空间的警告
在处理高维空间时,扩散模型的采样算法通常首先用随机噪声初始化一个画布,然后根据模型预测反复更新这个画布,直到最终出现一个来自模型分布的样本。
我们将用向量 \(\mathbf{x}_t\) 来表示这个画布,其中 \(t\) 代表采样过程中的当前时间步。按照惯例,逐渐将输入数据腐化为随机噪声的扩散过程是向前进行的,从 \(t = 0\) 到 \(t = T\),所以采样过程则是向后进行的,从 \(t = T\) 到 \(t = 0\)。因此,\(t = T\) 对应于随机噪声,而 \(t = 0\) 对应于数据分布中的一个样本。
\(\mathbf{x}_t\) 是一个高维向量:例如,如果一个扩散模型产生大小为64x64的图像,那么有12,288个不同的标量强度值(每个像素3个颜色通道)。然后,采样过程在一个12,288维欧几里得空间中追踪一条路径。
对于人类大脑来说,实际上很难理解这在实践中是什么样子的。因为我们的直觉坚固地植根于我们的3D环境中,它在高维空间中以令人惊讶的方式失败。不久前,我写了一篇博客文章,讨论了特别是高维概率分布的一些含义。如果你想快速了解高维空间的奇特之处,这篇关于为什么高维球体是“尖锐的”的注解也值得一读。关于高维几何的更彻底的处理可以在《数据科学基础》('Foundations of Data Science')第2章找到,该书由Blum、Hopcroft和Kannan撰写,可免费下载PDF格式。
尽管如此,在这篇博客文章中,我会使用代表 \(\mathbf{x}_t\) 的二维图表,因为不幸的是,这是你屏幕上所有可用的空间维度。这是危险的:遵循我们在2D中的直觉可能会导致错误的结论。但我还是要这么做,因为尽管如此,我发现这些图表对解释诸如引导等操作如何实际影响扩散采样非常有帮助。
以下是Geoff Hinton在处理高维空间时给出的一些建议,这些建议可能会有所帮助:
Sampling algorithms for diffusion models typically start by initialising a canvas with random noise, and then repeatedly updating this canvas based on model predictions, until a sample from the model distribution eventually emerges.
We will represent this canvas by a vector [Math Processing Error], where [Math Processing Error] represents the current time step in the sampling procedure. By convention, the diffusion process which gradually corrupts inputs into random noise moves forward in time from [Math Processing Error] to [Math Processing Error], so the sampling procedure goes backward in time, from [Math Processing Error] to [Math Processing Error]. Therefore [Math Processing Error] corresponds to random noise, and [Math Processing Error]corresponds to a sample from the data distribution.
[Math Processing Error] is a high-dimensional vector: for example, if a diffusion model produces images of size 64x64, there are 12,288 different scalar intensity values (3 colour channels per pixel). The sampling procedure then traces a path through a 12,288-dimensional Euclidean space.
It’s pretty difficult for the human brain to comprehend what that actually looks like in practice. Because our intuition is firmly rooted in our 3D surroundings, it actually tends to fail us in surprising ways in high-dimensional spaces. A while back, I wrote a blog post about some of the implications for high-dimensional probability distributions in particular. This note about why high-dimensional spheres are “spikey” is also worth a read, if you quickly want to get a feel for how weird things can get. A more thorough treatment of high-dimensional geometry can be found in chapter 2 of ‘Foundations of Data Science’(1) by Blum, Hopcroft and Kannan, which is available to download in PDF format.
Nevertheless, in this blog post, I will use diagrams that represent [Math Processing Error] in two dimensions, because unfortunately that’s all the spatial dimensions available on your screen. This is dangerous: following our intuition in 2D might lead us to the wrong conclusions. But I’m going to do it anyway, because in spite of this, I’ve found these diagrams quite helpful to explain how manipulations such as guidance affect diffusion sampling in practice.
Here’s some advice from Geoff Hinton on dealing with high-dimensional spaces that may or may not help:
……无论如何,你已经被警告过了!
… anyway, you’ve been warned!