diffusers
3e83f434 - Add structured prompt upsampling to Ideogram4 (#13860)

Commit
2 days ago
Add structured prompt upsampling to Ideogram4 (#13860) * Add structured prompt upsampling to Ideogram4 Rewrite prompts into Ideogram4's native structured JSON caption before encoding, opt-in via `prompt_upsampling=True` in `__call__` or the standalone `upsample_prompt`. Upsampling is driven by a generative `text_encoder` (`Qwen3VLForConditionalGeneration`, which carries the LM head); the head-less `Qwen3VLModel` is still supported for plain conditioning, and `upsample_prompt` raises an instructive error when the encoder cannot generate. Captions are schema-constrained via `outlines` when installed, and the modular pipeline gains a matching prompt-enhancer block. * Remove LM-head grafting; modular block uses a generative text_encoder Drop `graft_lm_head` and drive the modular `Ideogram4PromptUpsampleStep` off a generative `text_encoder` (`Qwen3VLForConditionalGeneration`), matching the standard pipeline: guard with `can_generate()` and an instructive error, and build the outlines logits processor lazily. Updates the copied `_get_text_encoder_hidden_states` to resolve the decoder for both encoder classes. * Style docstrings with doc-builder; mark prompt strings docstyle-ignore Reflow the Ideogram4 prompt-enhancer docstrings to the 119-col doc-builder style, and add `# docstyle-ignore` to the functional `CAPTION_SYSTEM_MESSAGE` and `CAPTION_USER_TEMPLATE` strings so the styler doesn't rewrap them (matching Flux2's `system_messages.py`). * Use an Ideogram4PromptEnhancerHead component for prompt upsampling Add `Ideogram4PromptEnhancerHead`, a small `ModelMixin` holding the Qwen3-VL LM head, as an optional `prompt_enhancer_head` pipeline component. Upsampling loads the head via a normal `from_pretrained` (its own repo, or bundled later) instead of an in-pipeline download, and grafts it onto the shared head-less `text_encoder` body so no second 8B body is loaded. Both the standard and modular pipelines build the generative model from `text_encoder` + the head; `upsample_prompt` raises an instructive error when the head component is absent. * Update src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Update src/diffusers/pipelines/ideogram4/pipeline_ideogram4.py Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Apply suggestion from @apolinario * Apply suggestion from @dg845 Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com> * Apply suggestion from @apolinario * Fix trailing whitespace * docs: add prompt-upsampling examples (remote API + local head) for Ideogram4 --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Author
Parents
Loading