transformers
78ef5832 - 🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. (#33325)

Commit

1 year ago

🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. (#33325) * intilize new embeddings from normal distrib * Fix typo in comments * Fix typo in comments * Fix style * Fix variables naming * Add tests * Fix style * code consistency nit * Add deepspeed support * Add deepspeed support * Conver embeddings weights to float32 before computations * Add deepspeed tests * Cover when vocab_size is smaller than embedding_size * Style fix * Add tests for vocab_size smaller than hiddin_size * Style fix * Nits in tests * Nits in tests * Check for deepspeed before importing it * Increase vocab_size for positive definite covariance matrix test * Add warning * Add multivariate_resizing flag and implement resizing for lm_heads * Fix typo * Fix wrong bias indexing * Fix bias is zero check * remove multivariate_resizing flag from tests * Intialize bias from old bias normal distribution * Fixup * Code usability * Use mean_resizing instead of multivariate_resizing * Fix up * Fix comments and docs

References

#33325 - 🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution.

Author

abuelnasr0

Parents

b916efcb

transformers 78ef5832 - 🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. (#33325)

transformers
78ef5832 - 🔴 🚨 Resizing tokens embeddings: initialize from old embeddings' normal distribution. (#33325)