transformers
a47a89a4 - [xLSTM] Fix bugs preventing small model training (#43209)

Commit
1 day ago
[xLSTM] Fix bugs preventing small model training (#43209) * Fix xLSTM bugs preventing small model training - Fix typo: vecM_k_combine should use .reshape() not () - Fix shape mismatch: use dqk // nc for correct head dimension - Fix return_last_states default to match docstring (bool | None = None) Fixes #43208 * Predefine dhqk variable and add shape calculation test Extracts dqk // nc into dhqk variable to clarify it represents the per-head query/key dimension. Adds test_chunkwise_shape_calculation to catch shape mismatches in chunkwise processing. * [xLSTM] Fix chunkwise shape regression test setup --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Author
Parents
Loading