transformers
6630c5b7 - Add xlstm model (#39665)

Commit
149 days ago
Add xlstm model (#39665) * Add xLSTM cleanly with optimizations. * Fix style. * Fix modeling test. * Make xLSTM package optional. * Fix: Update torch version check. * Fix: Bad variable naming in test. * Fix: Import structure cleaning with Ruff. * Fix: Update docstrings. * Fix: Mitigate unused config attr tests by explicit usage. * Fix: Skip tests, if xlstm library is not installed. * Feat: Enable longer context window for inference by chunking. * Fix: Make training test pass by lowering target accuracy. * Chore: Increase test verbosity for failing generation test. * Update docs/source/en/model_doc/xlstm.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Fix: Make xlstm available even without CUDA. * Chore: Remove unnecessary import. * Fix: Remove BOS insertion. * Chore: Improve xLSTMCache documentation. * Integrate basic xLSTM fallback code. * Chore: Remove unnecessary import. * Chore: Remove duplicate LayerNorm. * chore: update copyright, minor reformatting * fix: refactor mLSTMStateType due to missing torch import * fix: add missing import * Chore: Replace einops. * fix: apply ruff formatting * fix: run `make fix-copies` to re-generate dummy_pt_objects.py * fix: make type hints Python 3.9 compatible * fix: remove obsolete import * fix: remove obsolete method from docs * chore: remove obsolete `force_bos_token_insert` from config * Chore: Remove duplicated xLSTMCache class. * Fix: Formatting of modeling_xlstm.py * Chore: Remove xlstm package requirement from test. Re-add update_rnn_state. * Fix: Update xLSTMCache docstring. * Feat: Add proper initialization of xLSTM. * Chore: Re-format files. * Chore: Adapt format. * Fix: xLSTMCache import restructuring. * Fix: Add __all__ lists to modeling and configuration files. * Chore: Reformat. * Fix: Remove unnecessary update_rnn_state function. * Fix: Undo test accuracy quickfix. * Fix: Update copyright year, remvoe config copy. * Chore: Flatten all internal configs to xLSTMConfig. * Fix: Unused config variables check. * Chore: Remove unnecessary imports. * Fix: Unify xlstm cache argument from batch_size to max_batch_size. * Chore: Remove bad default arg value for xLSTMCache. * Chore: Rename core configuration arguments to HF default in xLSTM. * Chore: Fix formatting. * Fix: xLSTM Cache config access. * Fix: Update xlstm tests for config update. * Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B. * Fix: Configuration xLSTM python3.9 syntax. * Fix: Difference to main in test_utils.py assertion. * Fix: Bad syntax in xlstm config for python3.9. * Fix: xLSTMConfig docstring. * Fix: xLSTMConfig docstring. * Fix typing issues in xLSTM and BeiT, Paligemma. * Fix: Exclude xLSTM from test cache utils. * Chore: Fix style. * Chore: Fix format. * Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions. * Chore: Remove asserts and replace with ValueErrors. * Chore: Update __init__.py structure of xLSTM. * Chore: Clean xLSTM initialization of weights. * Fix index names in modeling_xlstm.py * Update xlstm model test typing annotations. * Fix: Remove all asserts. * Revert changes to the main __init__.py * Fix: Move xLSTMCache to modeling_xlstm.py * Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py * Remove xLSTMCache from dummy_pt_objects.py * Fix: Remove extended torchdynamo compilation check integrating cuda graph captures. * Revert test_cache_utils.py xLSTM change. * Fix: Move xLSTM init functions before init call. * Remove xLSTMCache from generation utils. * Fix: Clean xLSTM init functionality for recursive calls. * Fix: Move xLSTMCache before its first call. * Fix formatting. * Add partial docstring for xLSTMModel forward. * Fix xLSTMCache docstring in xLSTMModel. * Remove xLSTMCache from public documentation. Update auto_docstring. * Remove all agressive shape comments * style * Fix names * simplify * remove output_hidden_states * Update modeling_xlstm.py * Update modeling_xlstm.py * Update test_modeling_xlstm.py * Update modeling_xlstm.py * Update modeling_xlstm.py * fix * fix * style * style --------- Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com> Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com> Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>
Author
Parents
Loading