[Kernel] W8A16 Int8 inside FusedMoE #7415
mzusman
changed the title [Kernel] W8A16 Int8 MoE [Kernel] W8A16 Int8 inside FusedMoE 1 year ago
Add experts int8 config
6b834a37
Add support in fusedmoe
afddd3b1
Add experts int8 to quantization list
289367ab
Remove logger
084405e1
Add to optimized quantization
0c690fe5
Format
31004906
Add startup test for experts_int8
413400cc
Typo
9e7bc79f
Add test
1ebb5d7e
Change compute capabiltiy to 80
44a72d6b
Format
39660caf
Disable for CPU
a097b6e3
Add use_int8 to the moe benchmarks
c12635cb
mzusman
force pushed
to
c12635cb
1 year ago
Use JambaMoE to implement MLP
9436034c
Use MoE to implement MLP
4b712e44
Format
3b6967e4
Fix
5f5b11e2
mgoin
commented
on 2024-08-15
mgoin
commented
on 2024-08-15
Move experts_int8 to quantizatiob subdir and add is quant method
e199b177
Split if else in benchmark moe
9c47ad0f
Rename use_int8 to use_int8_w8a16, use_fp8 to use_fp_w8a8
97f0585f
Reverse order
00254591
Change dtype in configs filename
a1d75cb9
Single function to get dtype config name
505e3d34
Align experts int8 apply with fp8
80d977c1
Align with upstream
1c403be5
Format
744ecd4b
mgoin
commented
on 2024-08-15
Change fp8 to fp8_w8a8
a5bf0b34
Correct the args
1c7e6899
Remove experts int8 from ignore cpu
e438b84e
Fix typo
c23a2f46
Fix Jamba tests since MLP layer is not aligned with HF
7e619c7d
Merge remote-tracking branch 'github/main' into expert_int8_upstream
70a65983
dsikka
commented
on 2024-08-16
mgoin
approved these changes
on 2024-08-16
Merge remote-tracking branch 'github/main' into expert_int8_upstream
4d6c546e
simon-mo
merged
7fc23be8
into main 1 year ago
Login to write a write a comment.
Login via GitHub