pytorch
42b3fc29 - Fix NVRTC versioning for CUDA 11.X (X>=3), CUDA 12 and later (#57204)

Commit
3 years ago
Fix NVRTC versioning for CUDA 11.X (X>=3), CUDA 12 and later (#57204) Summary: NVRTC versioning has changed starting 11.3, and will change again for CUDA 12.X. See comment in code for detail. As a result, jit on CUDA 11.3 is broken. Also, the error message is misleading: When both `libname` and `alt_libname` are non-empty, the error message is only reporting `alt_libname`, it should report both. To reproduce the error, you can use: ```python import torch torch._C._jit_set_profiling_mode(False) torch._C._jit_set_profiling_executor(False) torch._C._jit_override_can_fuse_on_cpu(True) torch._C._jit_override_can_fuse_on_gpu(True) torch.jit.script def jit_relu_dropout(x, prob) : # type: (Tensor, float) -> Tensor x = torch.nn.functional.relu(x) x = torch.nn.functional.dropout(x, p=prob, training=True) return x x = torch.randn((64, 40, 12, 1024), device="cuda:0", dtype=torch.float16, requires_grad=True) y = jit_relu_dropout(x, 0.5) ``` with CUDA 11.3, and you will see ``` Traceback (most recent call last): File "/home/gaoxiang/misc/nvrtc-failure.py", line 16, in <module> y = jit_relu_dropout(x, 0.5) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: Error in dlopen or dlsym: libnvrtc-8aa72235.so.11.3: cannot open shared object file: No such file or directory ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57204 Reviewed By: ngimel Differential Revision: D28122083 Pulled By: malfet fbshipit-source-id: fd387cf79f33a6d5a5b93d54c9f21e9c23731045
Author
Parents
Loading