pytorch
2b2e0fdd - Add CUDA Sanitizer (#83984)

Commit
2 years ago
Add CUDA Sanitizer (#83984) Example of a simple synchronization error: ``` a = torch.rand(4, 2, device="cuda") with torch.cuda.stream(second_stream): torch.mul(a, 5, out=a) ``` Output produced by CSAN: ``` ============================ CSAN detected a possible data race on tensor with data pointer 139719969079296 Access by stream 94646435460352 during kernel: aten::mul.out(Tensor self, Tensor other, *, Tensor(a!) out) -> Tensor(a!) writing to argument: self, out, output With stack trace: File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch stack_trace = traceback.StackSummary.extract( File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 544, in __torch_dispatch__ errors = self.event_handler._handle_kernel_launch( File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped return f(self, *args, **kwargs) File "/private/home/sypniewski/pytorch/tester.py", line 9, in <module> torch.mul(a, 5, out=a) Previous access by stream 0 during kernel: aten::rand(int[] size, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> Tensor writing to argument: output With stack trace: File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 364, in _handle_kernel_launch stack_trace = traceback.StackSummary.extract( File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 544, in __torch_dispatch__ errors = self.event_handler._handle_kernel_launch( File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped return f(self, *args, **kwargs) File "/private/home/sypniewski/pytorch/tester.py", line 6, in <module> a = torch.rand(10000, device="cuda") Tensor was allocated with stack trace: File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 420, in _handle_memory_allocation traceback.StackSummary.extract( File "/private/home/sypniewski/pytorch/torch/utils/_cuda_trace.py", line 23, in fire_callbacks cb(*args, **kwargs) File "/private/home/sypniewski/pytorch/torch/_ops.py", line 60, in __call__ return self._op(*args, **kwargs or {}) File "/private/home/sypniewski/pytorch/torch/cuda/_sanitizer.py", line 541, in __torch_dispatch__ outputs = func(*args, **kwargs) File "/private/home/sypniewski/pytorch/torch/utils/_python_dispatch.py", line 76, in wrapped return f(self, *args, **kwargs) File "/private/home/sypniewski/pytorch/tester.py", line 6, in <module> a = torch.rand(10000, device="cuda") ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/83984 Approved by: https://github.com/ezyang
Author
Committer
Parents
Loading