text-generation-inference
58848cb4 - feat: enable pytorch xpu support for non-attention models (#2561)

Commit

1 year ago

feat: enable pytorch xpu support for non-attention models (#2561) XPU backend is available natively (without IPEX) in pytorch starting from pytorch 2.4. This commit extends TGI to cover the case when user has XPU support thru pytorch 2.4, but does not have IPEX installed. Models which don't require attention can work. For attention required models more work is needed to provide attention implementation. Tested with the following models: * teknium/OpenHermes-2.5-Mistral-7B * bigscience/bloom-560m * google/gemma-7b * google/flan-t5-xxl Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>

References

#2561 - feat: enable pytorch xpu support for non-attention models

Author

dvrogozh

Parents

7a82ddcb

text-generation-inference 58848cb4 - feat: enable pytorch xpu support for non-attention models (#2561)

text-generation-inference
58848cb4 - feat: enable pytorch xpu support for non-attention models (#2561)