Further refactor deepspeed.moe.utils + deepspeed.moe.layer type hints (#5060)
When unpacking a `dict`, keys that appear after the unpacking can
overwrite the keys of the unpacked `dict`, meaning we can avoid avoid
the pattern of skipping certain keys; also use `defaultdict` to avoid
having to do the boilerplate of assigning the elements of `group_moe`.
More type hints and small stylistic changes to `deepspeed.moe.layer`
---------
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>