transformers
ff2ba441 - [Performance] FP8 Grouped and Batched Matmuls (#44231)

Commit

134 days ago

[Performance] FP8 Grouped and Batched Matmuls (#44231) * simplify * finegrained fp8 moe forwards * optimized fp8 fused, batched and grouped paths * fix * wrap triton * fix calls * fix * remove fused quant kernel (litlle gain and unnecessary) and use torch library wrappers for better torch compileability * use kernels * fix * no need to wrap cutlass * cleanup * fix * added non gated experts support * remove comments * style * fix * Update src/transformers/quantizers/quantizer_finegrained_fp8.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update finegrained_fp8.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * per tensor scaling support * use custom fp8 interface * document --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

References

#44231 - [Performance] FP8 Grouped and Batched Matmuls

Author

IlyasMoutawwakil

Parents

0e7cb4e3

transformers ff2ba441 - [Performance] FP8 Grouped and Batched Matmuls (#44231)

transformers
ff2ba441 - [Performance] FP8 Grouped and Batched Matmuls (#44231)