Exclude the load balancing loss of padding tokens in Mixtral-8x7B #28517
fix the function load_balancing_loss_func in Mixtral_Moe to include a…
64c6860a
format code using black and ruff
faa113ba
skip computing mask if attention_mask=None
c28d0580
add tests for load balancing loss Mixtral-Moe
d8d1d3bf
fix assert loss is different in mixtral_test
bf9f8fe3
fix pad_leng
bc042005
use assertNotAlmostEqual and print to debug
d803c891
remove print for debug
147e128b
Bazzimore
approved these changes
on 2024-01-23
minor updates
3ea1e763
reduce rtol and atol
3deb0ad1
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub