onnxruntime
b7ae293b - Support large model export using multi-gpu (#17990)

Commit

2 years ago

Support large model export using multi-gpu (#17990) ### Description This PR is to implemente a exporter which works for large language models(LLM). It works for models like Llama2-70b or gpt-175. The main idea is to utilize multiple-GPU and dispatch differnet layers to different GPU, in short, it symply implemented auto pipeline parallelism. For example : to export Llama2-70b, you need 8x V100-32GB or 4x A100-80G or More GPU memories. It would expect to export decoder-only models. For encoder-decoder arch-like models, we didn't test it yet. ### Motivation and Context  --------- Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

References

#17990 - Support large model export using multi-gpu

Author

wejoncy

Parents

444a0eda

onnxruntime b7ae293b - Support large model export using multi-gpu (#17990)

onnxruntime
b7ae293b - Support large model export using multi-gpu (#17990)