Diffusion LLM Part 2: Discrete Diffusion -- How to Add Noise to Text
D3PM, Transition Matrices, Absorbing States, MDLM -- how to bring diffusion from continuous space to discrete tokens.

Diffusion LLM Part 2: Discrete Diffusion -- How Do You Add Noise to Text?
In Part 1, we explored the principles of Diffusion operating in continuous space. Adding Gaussian noise to image pixels is natural, but text tokens are discrete data. What happens if you add noise of 0.3 to "hello"?
In this post, we cover how to bring Diffusion into discrete space. Starting from D3PM's Transition Matrix and arriving at MDLM's Masked Diffusion -- the direct ancestors of LLaDA.
D3PM: Diffusion in Discrete Space
Austin et al. (2021) raise a fundamental question in D3PM (Discrete Denoising Diffusion Probabilistic Models): how do you define a forward process for discrete data where you can't add Gaussian noise?
Related Posts

From Evaluation to Deployment — The Complete Fine-tuning Guide
Evaluate with Perplexity, KoBEST, ROUGE-L. Merge adapters with merge_and_unload(), convert to GGUF, deploy via vLLM/Ollama. Overfitting prevention, data quality, hyperparameter guide.

QLoRA + Custom Dataset — Fine-tune 7B on a Single T4 GPU
Fine-tune Qwen 2.5 7B on a T4 16GB using QLoRA (4-bit NormalFloat + LoRA). Korean dataset preparation guide, NF4/Double Quantization/Paged Optimizer explained, Wandb monitoring.

Mastering LoRA — Fine-tune a 7B Model on a Single Notebook
From LoRA theory to hands-on Qwen 2.5 7B fine-tuning. Train only 0.18% of parameters while achieving 98% of full fine-tuning performance. VRAM reduced from 130GB to 18GB.