Add PRXPixelPipeline: pixel-space PRX text-to-image pipeline (#13928)
Adds a pixel-space variant of PRX that denoises raw RGB directly
(no VAE), conditioned on a Qwen3-VL text encoder:
- PRXTransformer2DModel: new optional config args `bottleneck_size`
(two-layer img_in projection for large patch dims) and
`resolution_embeds` (PRXResolutionEmbedder conditions the timestep
modulation on the latent resolution)
- PRXPipeline: support for subclass-tuned tokenizer max length, light
text cleaning, x-prediction flow matching (x0 -> velocity conversion),
and non-unit initial noise scale
- PRXPixelPipeline: thin subclass wiring the above together
(vae optional/None, vae_scale_factor=1, 1024px default)
- conversion script support for the pixel checkpoint format
- registration in __init__ files + dummy objects, docs autodoc entry,
fast pipeline tests
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>