transformers
b69a62d5 - [BLOOM] Clean modeling code (#18344)

Commit

2 years ago

[BLOOM] Clean modeling code (#18344) * Cleanup some code * Improve signatures * Try to reduce the number of reshape/copies * I don't think we actually need the layer_num scaling trick * No need for duplication * Try to fix beam_search * Fix beam search * Removing layer num normalization seems to be breaking * Not sure self.layer_number normalization actually matters * Try and be backward compatible * Try to fix beam_search * Revert attempt to be backward compatible * Improve documentation on past_key_values format * Optimize the device allocation in case of hidden_states in multiple devices * No need to manually cast the values to a specific device * Rename with long version of variables * Improve type hinting * Add comment that explains that some methods return views * Actually i think the attention casting only makes sense when we use torch.float16 * We don't actually need layer_number to be passed anymore * Fix FX test * Bypass torch.baddbmm * Apply suggestions from code review * Add comment about support for torchScript v1.11 * fix ONNX support for bloom (#18456) Co-authored-by: Niklas Muennighoff <n.muennighoff@gmail.com> Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>

References

#27720 - Add common processor tests

#32831 - [Docs] Update resources

#29969 - [SigLIP] Add fast tokenizer

#38622 - [AutoModelForMaskGeneration] Remove duplicate code

#33111 - [Backbone] Remove out_features everywhere

#33174 - [Zero-shot image classification pipeline] Remove tokenizer_kwargs

#19449 - [WIP] Fix weights initialization of several vision models

#18344 - [BLOOM] Clean modeling code

Author

thomasw21

Parents

02b176c4

Files2

src/transformers/models/bloom
- configuration_bloom.py
- modeling_bloom.py

transformers b69a62d5 - [BLOOM] Clean modeling code (#18344)

transformers
b69a62d5 - [BLOOM] Clean modeling code (#18344)