Implement rotary embeddings (#7)

Commit

4 years ago

Implement rotary embeddings (#7) * Integrate EleutherAI's version of rotary embeddings + make some small optimisation * Add argument parser for position embeddings * Making max-absolute-position-embeddings optional * Move enum outside model * Handle max_seq_len_cached better * Fix dtype issue in rotary embeddings * Fix tensor size * Replace hidden_dim by hidden_size_per_attention_head * Change all examples to new format and improve help in argparser * Revert back changes, add comparison with position embedding type when checkpointing and replace args.max_position_embeddings with an upper bound on the sequence sizes * Revert back changes: - Rename max-absolute-embeddings back to max-absolute-embeddings - Make absolute position embeddings the default * Reformat * Rm run.sh~ and modify back run.sh Co-authored-by: Thomas <ö95242+thomasw21@users.noreply.github.com>

References

#7 - Implement rotary embeddings

Author

thomasw21

Parents

5563ec6f

Megatron-DeepSpeed dc4e0cba - Implement rotary embeddings (#7)

Megatron-DeepSpeed
dc4e0cba - Implement rotary embeddings (#7)