Megatron-DeepSpeed
dc4e0cba - Implement rotary embeddings (#7)

Commit
4 years ago
Implement rotary embeddings (#7) * Integrate EleutherAI's version of rotary embeddings + make some small optimisation * Add argument parser for position embeddings * Making max-absolute-position-embeddings optional * Move enum outside model * Handle max_seq_len_cached better * Fix dtype issue in rotary embeddings * Fix tensor size * Replace hidden_dim by hidden_size_per_attention_head * Change all examples to new format and improve help in argparser * Revert back changes, add comparison with position embedding type when checkpointing and replace args.max_position_embeddings with an upper bound on the sequence sizes * Revert back changes: - Rename max-absolute-embeddings back to max-absolute-embeddings - Make absolute position embeddings the default * Reformat * Rm run.sh~ and modify back run.sh Co-authored-by: Thomas <รถ95242+thomasw21@users.noreply.github.com>
Author
Parents
Loading