onnxruntime
4bfa8448 - Skip module clone for preparing large model export (#18663)

Commit

2 years ago

Skip module clone for preparing large model export (#18663) ### Skip module clone for preparing large model export For LLAMA2 13B, when running with Lora, DeepSpeed stage2 on 8 GPUs . It failed during preparing outputs which will be used for torch.onnx.export. The reason, we deep copy all the params including both big sizes of frozen weights, + a little bit of Lora trainable weight. This PR will firstly check whether the GPU memmory is enough for a cloned module, if not, skip the copy. Copying the module is to guarantee the fw path run may change the weight, while this case should be rare. But for now, Not-Able-To-Run is worse than Runnable-with-A-little-bit-different-initial-weight, especially for large models.

References

#18663 - Skip module clone for preparing large model export

Author

pengwa

Parents

9aa72843

onnxruntime 4bfa8448 - Skip module clone for preparing large model export (#18663)

onnxruntime
4bfa8448 - Skip module clone for preparing large model export (#18663)