2. Diffusion-LM: Noise-to-Meaning with Classifier Guidance
Static-to-speech diffusion. Add Gaussian noise to token embeddings → predict the noise out. Because it’s continuous, you can steer with gradients (sentiment, style, etc.) like turning a radio dial until the song is crisp. (DDIM-8, classifier-free guidance)
It’s like painting with watercolors. You start with a wet, blurry sketch, but over 8 quick steps, it dries and sharpens. As it’s drying, you can tweak the colors - like making it brighter, or adding more blue - and the painting will adapt as it sets.