diffusers
79de3064 - [LLADA2] Fix llada2 review #13598 (#13698)

Commit
36 days ago
[LLADA2] Fix llada2 review #13598 (#13698) * [LLaDA2] address review findings from #13598 Fixes the six in-scope issues raised in the llada2 model/pipeline review: 1. Carry tokenizer `attention_mask` through `_prepare_input_ids` and add an `attention_mask` arg to `__call__` for pre-tokenized inputs. The runtime mask now reflects prompt padding and zeros out the block-aligned tail past `prompt_length + gen_length` instead of treating those positions as valid context. 2. Thread the per-call `block_length` into `BlockRefinementScheduler.set_timesteps` so the transfer schedule matches the requested block size (previously the scheduler only read its constructor default). 3. Drop `x0`/`x0_p`/`confidence` from `_callback_tensor_inputs` (never bound locals) and bind `sampled_tokens`, `sampled_probs`, `editing_transfer_index`, `active_block` so all advertised callback keys resolve. 4. Allow EOS exactly at index `prompt_length` (the first generated position) to mark a row finished. 5. Freeze rows that have already emitted EOS so subsequent block refinement doesn't extend them, and trim per-row at decode (previously gated on batch_size==1) so post-EOS positions don't leak into decoded text. 6. Stop calling `self.set_progress_bar_config(...)` from inside `__call__`; build a local config dict for the inner block bar so user-supplied flags (in particular `disable=True`) survive the call. Adds regression tests pinning each of the six fixes. * fix formatting * undo changes * set block_length to optional and use scheduler's default --------- Co-authored-by: dg845 <58458699+dg845@users.noreply.github.com>
Author
Parents
Loading