AI ResearchKR

Spectrum: 3-5x Diffusion Speedup Without Any Training -- The Power of Chebyshev Polynomials

CVPR 2026 paper from Stanford/ByteDance. Chebyshev polynomial feature forecasting achieves 4.79x speedup on FLUX.1, 4.56x on HunyuanVideo. Training-free, instantly applicable to any model.

Spectrum: 3-5x Diffusion Speedup Without Any Training -- The Power of Chebyshev Polynomials

Spectrum: 3-5x Diffusion Speedup Without Any Training -- The Power of Chebyshev Polynomials

Diffusion models produce stunning images and videos, but they're slow. A 50-step sampling process requires pushing billions of parameters through the network at every single step. Methods like DDIM and DPM-Solver reduce the number of steps, but each step still demands a full network forward pass.

Spectrum, from Stanford and ByteDance (CVPR 2026), takes an entirely different approach. Instead of reducing steps, it skips the network computation at certain steps entirely -- without any additional training. The key insight: model the feature evolution along the diffusion trajectory using Chebyshev polynomials, then forecast features at skipped steps.

The results: 3.47-4.79x speedup on FLUX.1, 3.36-4.56x on HunyuanVideo -- with minimal quality degradation.

Background: Two Directions for Diffusion Acceleration

Making diffusion models faster falls into two categories:

1. Step Reduction

Methods like DDIM, DPM-Solver, and DPM-Solver++ use better ODE/SDE solvers to reduce 50 steps to 20-25. But each step still requires a full network forward pass.

2. Feature Caching/Reuse

Methods like DeepCache (CVPR 2024) reuse features computed at previous steps, allowing some steps to skip the network computation entirely. This reduces the per-step cost rather than the step count.

Spectrum is the latest evolution of the second category. Rather than naively copying previous features, it makes mathematically rigorous predictions.

The Problem with Taylor Expansion

Before Spectrum, TaylorSeer (ICCV 2025) attempted feature prediction using Taylor expansion. The fundamental problem: Taylor expansion is a local approximation. It's accurate near the cached points but errors grow rapidly with distance. When you skip multiple steps, errors compound and image quality degrades significantly.

Think of it this way: Taylor expansion predicts the future by looking at "what just happened recently." It's like predicting stock prices will keep rising because they rose yesterday -- reasonable short-term, but unreliable for longer horizons.

Spectrum's Core Idea: Global Spectral Approximation

Spectrum's key insight is elegant:

View each feature channel's evolution along the diffusion sampling trajectory as a function over time, and approximate it with a linear combination of Chebyshev polynomials.

Chebyshev polynomials are orthonormal bases known to provide optimal function approximation. The critical advantages:

  1. Global approximation: Captures the pattern across the entire time interval
  2. Non-compounding errors: Approximation error is independent of step size (Theorem 3.3)
  3. Stable fitting: Ridge regression prevents overfitting

If Taylor is a "local weather forecast," Spectrum is "climate pattern modeling." By capturing the overall trend, it can accurately predict further into the future.

Algorithm Details

Step 1: Timestep Mapping

Map diffusion timesteps to the Chebyshev domain [-1, 1]:

tau = g(t) = 2t - 1

Step 2: Chebyshev Polynomial Approximation

Approximate each feature channel h_i(t) as a linear combination of M Chebyshev polynomials:

h_i(t) ≈ c_0 * T_0(tau) + c_1 * T_1(tau) + ... + c_M * T_M(tau)

The Chebyshev polynomials of the first kind:

  • T_0(x) = 1
  • T_1(x) = x
  • T_2(x) = 2x² - 1
  • T_3(x) = 4x³ - 3x
  • T_4(x) = 8x⁴ - 8x² + 1

The default setting uses M=4 (4th degree polynomial).

Step 3: Ridge Regression Coefficient Fitting

Using feature values from steps where actual forward passes were computed, fit the coefficient vector C:

C = (Φ^T·Φ + λ·I)^{-1} · Φ^T · H

Where:

  • Φ is the Chebyshev basis evaluation matrix at computed steps
  • H contains the actual feature values at those steps
  • λ=0.1 is the regularization strength (prevents overfitting)

Step 4: Feature Forecasting

At steps in the forecast set V (skipped steps), predict features using the fitted coefficients:

h(t_j) = φ(g(t_j)) · C

A simple matrix-vector product replaces the entire network forward pass.

Step 5: Adaptive Scheduling

Steps are divided into two sets:

  • U (actual set): Steps with full network forward passes where coefficients are updated
  • V (forecast set): Steps where features are predicted via Chebyshev approximation

The flex_window parameter (α) controls adaptive window scaling. As more data points are collected, the forecast horizon grows, allowing more computation to be skipped in later steps.

Error Bounds: Why This Beats Taylor

The theoretical core is Theorem 3.3:

ε_M = ||f - p_M||_∞ ≤ (2B / (ρ - 1)) · ρ^{-M}

This bound is independent of step size. Increasing M (polynomial degree) reduces error exponentially. In contrast, Taylor expansion errors compound with the skip horizon.

Empirically confirmed: Feature RMSE at step 50 is Spectrum 0.1674 vs Taylor 0.2510 (33% lower).

Results

Text-to-Image

FLUX.1-dev (50-step reference):

MethodNFESpeedupPSNR↑SSIM↑LPIPS↓
Spectrum (α=0.75)143.47x24.320.8540.217
Spectrum (α=3.0)104.79x22.210.7880.261
TaylorSeer (N=4)~163.13x22.310.8410.215
TaylorSeer (N=6)~123.99x17.410.7080.389

Stable Diffusion 3.5-Large:

MethodNFESpeedupPSNR↑SSIM↑LPIPS↓
Spectrum (α=0.75)143.21x17.830.7430.305
Spectrum (α=3.0)104.32x15.680.6200.430

Text-to-Video

HunyuanVideo:

MethodNFESpeedupPSNR↑SSIM↑LPIPS↓
Spectrum (α=0.75)143.36x27.770.8420.209
Spectrum (α=3.0)104.56x25.390.7790.273

Wan2.1-14B:

MethodNFESpeedupPSNR↑SSIM↑LPIPS↓
Spectrum (α=0.75)143.40x22.780.7490.222
Spectrum (α=3.0)104.67x21.240.6940.265
TaylorSeer (N=6)~123.94x17.240.5850.367

The gap is especially pronounced in video generation, where each step's cost is higher due to the larger number of frames, making feature prediction accuracy critical.

Supported Models

Spectrum works across both U-Net and Transformer/DiT architectures:

ModelArchitectureTask
FLUX.1-devDiT (Transformer)Text-to-Image
SD 3.5-LargeMMDiTText-to-Image
SDXLU-NetText-to-Image
HunyuanVideoDiTText-to-Video
Wan2.1-14BDiTText-to-Video

Architecture-agnostic operation is a major advantage. By applying feature caching only to the last block, Spectrum minimizes dependency on model internals.

Hyperparameter Guide

ParameterDefaultRole
w0.5-1.0Blending factor (1.0 = pure Chebyshev)
λ (lam)0.1Ridge regression regularization
M (m)4Number of Chebyshev basis functions
N (window_size)2Initial fitting window size
α (flex_window)0.75Adaptive window scaling

Practical tips:

  • α=0.75 prioritizes quality, α=3.0 prioritizes speed
  • λ too small (0.001) causes overfitting, too large (10) causes underfitting
  • M=4 is the sweet spot between accuracy and computational cost

Comparison with Other Methods

CategoryRepresentativePrincipleRelation to Spectrum
Step reductionDDIM, DPM-SolverBetter ODE solversComplementary -- can combine
Naive cachingDeepCacheCopy previous featuresSpectrum strictly superior
Local predictionTaylorSeerTaylor expansionSpectrum wins via non-compounding error
Spectral predictionSpectrumChebyshev polynomial fitting--

Key point: Spectrum is orthogonal to step-reduction methods. You can apply both simultaneously -- reduce step count AND reduce per-step cost for compounding acceleration.

Hands-On: Accelerating SDXL with Spectrum

A practice notebook using the official code (github.com/hanjq17/Spectrum) is available separately, covering:

  1. Loading SDXL and baseline generation
  2. Applying Spectrum and comparing speed/quality
  3. Visualizing results across hyperparameter settings
  4. Analyzing Chebyshev approximation error

Conclusion

Spectrum introduces a new paradigm for diffusion model acceleration:

  1. Training-free: Instantly applicable to any pretrained model
  2. Theoretically grounded: Non-compounding error bound from Chebyshev approximation
  3. Universal: Supports both U-Net and DiT architectures, both image and video
  4. Practical: Up to 4.79x speedup, bringing real-time generation closer to reality

Combined with step-reduction methods, even greater acceleration is achievable. A ComfyUI plugin is already available for immediate integration into production workflows.

References:

Stay Updated

Follow us for the latest posts and tutorials

Subscribe to Newsletter

Related Posts