Refactor the positional emebdding config code (#4920)
The Mixtral PR https://github.com/microsoft/DeepSpeed/pull/4828 has
introduced the positional embedding config class which is a required
argument of `make_attn_layer()` function. This has forced the user to
override and duplicate the `make_attn_layer()` call for new model
implementations using RoPE (This has also broken the Falcon model
implementations). This PR:
- refactors the inference transformer base class to avoid code
duplication by adding a new abstract `positional_embedding_config`
property
- Fixes the Falcon model implementation to use positional embedding
config.
The models `llama_v2`, `OPT`, `Mistral 7B`, `Mixtral`, `Falcon` and
`Phi-2` are tested with the PR!
---------
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>