Expose GLU activations as arguments (#69)
* feat: expose glu activations as argument
* chore: rename activations -> glu_activations
* refactor: use lookup dict instead of `getattr()`
* refactor: mv lookup dict to `glu_activations.py`
* chore: rm unnecessary default arg
* test: add bf16 test; gelu in `test_training_all()`
* Update megatron/testing_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* refactor: use `require_torch_bf16` decorator
* chore: comment out bf16 test
uncomment in the future when torch supports gelu kernels for bf16
* consistent style
* fix look up table
* better grouping
* fix: replace hard coded options with `GLU_ACTIVATIONS`
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>