blt wip (#38579)
* blt wip
* cpu version
* cpu friendly with full entropy model (real time patching)
* adding config file instead of args file
* enable MPS
* refactoring unused code
* single config class in config file
* inherit from PreTrainedModel
* refactor LMTransformer --> BLTPatcher
* add conversion script
* load from new checkpoing with form_pretrained
* fixed demo from_pretrained
* clean up
* clean a few comments
* cleanup folder
* clean up dir
* cleaned up modeling further
* rename classes
* adding transformers Attention class and RotaryEmbedding class
* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc
* seperate out patcher config, update modeling and conversion script
* rename vars to be more transformers-like
* rm unused functions
* adding cross attention from transformers
* pass arg
* rename weights
* updated conversion script
* overwritten commit! fixing PR
* apply feedback
* adding BLTRMSNorm like Llama
* add repeat_kv and eager_attention_forward copied from
* BLTMLP identical to MllamTextMLP
* clean up some args'
* more like mllama, but busier inits
* BLTTransformerLayer config
* decoder, encoder, global configs
* wip working on modular file
* cleaning up patch and configs
* clean up patcher helpers
* clean up patcher helpers further
* clean up
* some config renaming
* clean up unused configs
* clean up configs
* clean up configs
* update modular
* clean
* update demo
* config more like mllama, seperated subconfigs from subdicts
* read from config instead of self args
* update demo file
* model weights to causal lm weights
* missed file
* added tied weights keys
* BLTForCausalLM
* adding files after add-new-model-like
* update demo
* working on tests
* first running integration tests
* added integration tests
* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff
* tokenizer clean up
* modular file
* fixing rebase
* ruff
* adding correct basemodel output and updating config with checkpoint vals (for testing)
* BLTModelTests git status
* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic
* fix sdpa == causal tests
* fix small model test and some gradient checkpointing
* skip training GC tests
* fix test
* updated modular
* update modular
* ruff
* adding modular + modeling
* modular
* more modern is_casual check
* cleaning up modular
* more modular reduction
* ruff
* modular fix
* fix styling
* return 2
* return 2
* fix some tests
* fix bltcrossattention after modular break
* some fixes / feedback
* try cache generate fix
* try cache generate fix
* fix generate tests
* attn_impl workaround
* refactoring to use recent TransformersKwargs changes
* fix hidden_states shape test
* refactor to new outputs
* simplify outputs a bit
* rm unneeded decoderlayer overwriting
* rename blt
* forgot tokenizer test renamed
* Reorder
* Reorder
* working on modular
* updates from modular
* new modular
* ruff and such
* update pretrainedmodel modular
* using cohere2 apply_rotary_pos_emb
* small changes
* apply feedback r2
* fix cross_attention
* apply more feedback
* update modeling fix
* load submodules from pretrainedmodel
* set initializer_range to subconfigs
* rm cross_attnetion_states pass when not needed
* add 7b projection layer support
* check repo
* make copies
* lost cohere2 rotate_half
* ruff
* copies?
* don't tie weights for submodules
* tie weights setting
* check docstrings
* apply feedback
* rebase
* rebased modeling
* update docs
* applying feedback
* few more fixes
* fix can_record_outputs
* fast tokenizer
* no more modulelist
* tok auto
* rm tokenizersss
* fix docs
* ruff
* fix after rebase
* fix test, configs are not subscriptable
---------
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>