transformers
Exclude the load balancing loss of padding tokens in Mixtral-8x7B
#28517
Merged

Exclude the load balancing loss of padding tokens in Mixtral-8x7B #28517

khaimt
khaimt fix the function load_balancing_loss_func in Mixtral_Moe to include a…
64c6860a
khaimt format code using black and ruff
faa113ba
ArthurZucker
ArthurZucker commented on 2024-01-16
ArthurZucker
ArthurZucker commented on 2024-01-17
khaimt skip computing mask if attention_mask=None
c28d0580
ArthurZucker
ArthurZucker commented on 2024-01-19
HuggingFaceDocBuilderDev
khaimt
khaimt add tests for load balancing loss Mixtral-Moe
d8d1d3bf
khaimt
khaimt fix assert loss is different in mixtral_test
bf9f8fe3
khaimt fix pad_leng
bc042005
khaimt
congruency
khaimt
congruency
khaimt
khaimt use assertNotAlmostEqual and print to debug
d803c891
khaimt remove print for debug
147e128b
khaimt
Bazzimore
Bazzimore approved these changes on 2024-01-23
Bazzimore
Bazzimore
ArthurZucker
ArthurZucker
ArthurZucker commented on 2024-01-23
khaimt minor updates
3ea1e763
khaimt reduce rtol and atol
3deb0ad1
khaimt
ArthurZucker
ArthurZucker approved these changes on 2024-01-24
ArthurZucker ArthurZucker merged c5c69096 into main 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone