onnxruntime
8818a99c - Set proper nvcc threads to avoid OOM (#17419)

Commit

2 years ago

Set proper nvcc threads to avoid OOM (#17419) ### Description There are 8 cu files under [flash attention](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/flash_attention) and 4 cu files under [cutlass fmha](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/cutlass_fmha) need a lot of memory to compile. Previously, the default value is same as parallel - number of CPU cores. Standard_NC4as_T4_v3 has 4 CPUs and 28 GB memory, and we launched 16 nvcc threads in total (4 parallel jobs, and 4 nvcc threads per job). Each thread might take 4 GB on average (peak is around 6GB, but threads are not started at same time). OOM happens since 16 threads might need close to 64 GB in worst case. When build machine has 64GB or larger memory, OOM is rare. Here we set a proper nvcc --threads based on available memory to avoid OOM. ### Motivation and Context Fix `Python Packaging Pipeline (Training Cuda 11.8)`

References

#17419 - Set proper nvcc threads to avoid OOM

Author

tianleiwu

Parents

e3bb2a0c

onnxruntime 8818a99c - Set proper nvcc threads to avoid OOM (#17419)

onnxruntime
8818a99c - Set proper nvcc threads to avoid OOM (#17419)