DeepSpeed
c1e02052 - Refactor the positional emebdding config code (#4920)

Commit

2 years ago

Refactor the positional emebdding config code (#4920) The Mixtral PR https://github.com/microsoft/DeepSpeed/pull/4828 has introduced the positional embedding config class which is a required argument of `make_attn_layer()` function. This has forced the user to override and duplicate the `make_attn_layer()` call for new model implementations using RoPE (This has also broken the Falcon model implementations). This PR: - refactors the inference transformer base class to avoid code duplication by adding a new abstract `positional_embedding_config` property - Fixes the Falcon model implementation to use positional embedding config. The models `llama_v2`, `OPT`, `Mistral 7B`, `Mixtral`, `Falcon` and `Phi-2` are tested with the PR! --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>

References

#4920 - Refactor the positional emebdding config code

Author

arashb

Parents

16c265c0

DeepSpeed c1e02052 - Refactor the positional emebdding config code (#4920)

DeepSpeed
c1e02052 - Refactor the positional emebdding config code (#4920)