text-generation-inference
0f346a32 - Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688)

Commit

1 year ago

Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688) * Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels Performance and accuracy of these kernels are on par (tested with Llama 70B and 405B). Removes a dependency and resolves some stability issues we have been seeing. * Update test snapshots

Author

danieldk

Parents

ba5fc7d9

text-generation-inference 0f346a32 - Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688)

text-generation-inference
0f346a32 - Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688)