transformers
ff2ba441 - [Performance] FP8 Grouped and Batched Matmuls (#44231)

Commit
71 days ago
[Performance] FP8 Grouped and Batched Matmuls (#44231) * simplify * finegrained fp8 moe forwards * optimized fp8 fused, batched and grouped paths * fix * wrap triton * fix calls * fix * remove fused quant kernel (litlle gain and unnecessary) and use torch library wrappers for better torch compileability * use kernels * fix * no need to wrap cutlass * cleanup * fix * added non gated experts support * remove comments * style * fix * Update src/transformers/quantizers/quantizer_finegrained_fp8.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update finegrained_fp8.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * per tensor scaling support * use custom fp8 interface * document --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Parents
Loading