Add support for scalar FP8 weight scales (#2550)

Commit

1 year ago

Add support for scalar FP8 weight scales (#2550) * Add support for scalar FP8 weight scales * Support LLM compressor FP8 checkpoints on H100 On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype. However, we wouldn't pick up fp8 quantization for models quantized with LLM compressor. This change adds enough parsing to detect if models have FP8-quantized weights. * Remove stray debug print

References

#2550 - Add support for scalar FP8 weight scales

Author

danieldk

Parents

0ff6ff60

text-generation-inference c29dc89c - Add support for scalar FP8 weight scales (#2550)

text-generation-inference
c29dc89c - Add support for scalar FP8 weight scales (#2550)