[discrete diffusion] Add DiffusionGemma pipeline and schedulers (#13986)
* Add discrete DDIM and entropy bound schedulers and a uniform mode for block refinement
* Add DiffusionGemma block-diffusion pipeline
* Add DiffusionGemma pipeline tests and docs
* Put DiffusionGemma docs under the Text pipelines section
* Add static cache and fullgraph-compiled decoder path to DiffusionGemma pipeline
* Compile decoder externally for the static cache path instead of a pipeline flag
* Prefill the encoder once into a reusable cache and sync default denoising steps
* Support image prompts by forwarding pixel_values to the encoder prefill
* Restyle docstrings to satisfy doc-builder
* Sort the new scheduler and pipeline exports
* Let any of the three schedulers drive the pipeline
* Document the schedulers and updated defaults in the pipeline docs
* Sort the scheduler dummy objects
* Set scheduler sampling knobs on the scheduler config, not the pipeline call
* Accept raw prompt/image/messages instead of pre-tokenized model inputs
* Add leave-one-out predictor-corrector to DiscreteDDIM scheduler
Adds optional Gibbs corrector sweeps after each predictor step for
uniform diffusion, recovering the LOO denoiser in closed form so it
works on the released checkpoint with no retraining.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Forward PEFT adapter API on the DiffusionGemma pipeline
The denoiser is a Transformers model, so adapters (LoRA, DoRA, ...) load
through its native PEFT integration rather than the diffusers LoRA loader.
Also dispatch the predictor-corrector by scheduler capability instead of class.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Fix callback kwargs gathering on Python < 3.12
Build callback_kwargs with a loop instead of a dict comprehension, whose
own scope hides locals() on pre-3.12 (PEP 709), causing KeyError: 'canvas'.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Speed up diffusion gemma sampling to match transformers
Add adaptive stopping (stop a block once its prediction is stable and
confident) and make the decoder compile cudagraph-safe via
cudagraph_mark_step_begin + logits clone. ~175 -> ~372 tok/s. Also align
the decoder mask with the new transformers#46654 layout.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Enable adaptive stopping by default
Default confidence_threshold to 0.005 to match the released checkpoint
and transformers, so the speedup is on out of the box.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Fold corrector sweeps into the step budget
Run fewer predictor steps and spend the freed forwards on the corrector,
so predictor-corrector sampling costs the same total forwards as plain
ancestral (~2x faster), matching the paper.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Commit the converged prediction on adaptive stop
Ancestral schedulers like DiscreteDDIM only clean the canvas on the final
step, so stopping early left noise tokens. Use the denoiser argmax instead,
which is the converged answer and matches the canvas for commit schedulers.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* anneal sampling temperature and fix static cache decoder mask
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
* Update src/diffusers/pipelines/diffusion_gemma/pipeline_diffusion_gemma.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Update src/diffusers/schedulers/scheduling_entropy_bound.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* Update src/diffusers/pipelines/diffusion_gemma/pipeline_diffusion_gemma.py
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
* address dg845 review comments
* fix entropy scheduler temperature scaling
* update decoder mask for new transformers
* self-condition on the temperature-shaped logits
* move temperature annealing into EntropyBoundScheduler
* self-condition on the entropy scheduler's shaped logits
* show torch.compile + static cache in the usage example
* removed wrong commit
* Update src/diffusers/pipelines/diffusion_gemma/pipeline_diffusion_gemma.py
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>
* address review comments
* expose pred_logits on all schedulers
---------
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com>