onnxruntime
d28c26a9 - [ROCm] fix: obtain AMD GPU memory info through rocm_smi library (#21190)

Commit
1 year ago
[ROCm] fix: obtain AMD GPU memory info through rocm_smi library (#21190) ### Description Previously ROCMExecutionProvider uses `hipMemGetInfo` to obtain the sizes of total memory and available memory. However, this API has been broken since ROCm 5.7. In this PR, we use `rocm_smi` library instead of `hipMemGetInfo`. ### Motivation and Context `hipMemGetInfo` API has been broken since ROCm 5.7 and inference with ROCMExecutionProvider will lead to following errors: ``` HIP failure 1: invalid argument ; GPU=0 ; hostname=4cc4900475fe ; file=/onnxruntime/onnxruntime/core/providers/rocm/rocm_execution_provider.cc ; line=229 ; expr=hipMemGetInfo(&free, &total); ``` MIOpen has a brute-force fix for this (https://github.com/ROCm/MIOpen/blob/911e67189592c311374940493f2099f3abced60d/src/hip/handlehip.cpp#L72). Instead of hard-coding available memory to 16GB, I suppose we could obtain memory info through `rocm_smi` library as in this PR.
Author
Parents
Loading