pytorch
2d26364f - [caffe2][cuda] Fix instrumentation of malloc/free SDTs for `CUDACachingAllocator` (#108907)

Commit View On GitHub

Commit

1 year ago

[caffe2][cuda] Fix instrumentation of malloc/free SDTs for `CUDACachingAllocator` (#108907) Summary: There's currently a bug in `CUDACachingAllocator` which makes it impossible to determine whether a `malloc`ed sample has been deallocated (introduced in D48229150). It happens because we currently instrument the `malloc` SDT **before** a block of memory has been allocated by either `cudaMalloc` or local cashing allocator `malloc` call. Since this is a static tracepoint, it receives arg values at the point of instrumentation. Currently, it receives the memory pointer, `void* p`, which is NULL. Changes in this diff: 1) Move this SDT to right before the `allocate` function returns, so that memory has been allocated already and `p` pointer points to a valid, non-NULL address. 2) Enable tracing of `cudaMalloc` calls, in addition to `NativeCachingAllocator::malloc` 3) renames a poorly-named local var: `r` --> `devPtr` (pointer to the allocated memory block) Test Plan: Tested with a local PyTorch script that leaks memory. Verified the following: * prior to this fix (prod), malloc samples are **not** marked as "freed" * with the fix (branch), samples **are** marked as "freed" * results are comparable with the current uprobe implementation to sample PyTorch malloc events in `gpusnoop` Reviewed By: chaekit Differential Revision: D48873734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/108907 Approved by: https://github.com/chaekit

Author

vlad-scherbich

Committer

pytorchmergebot

Parents

faa5985d

pytorch 2d26364f - [caffe2][cuda] Fix instrumentation of malloc/free SDTs for `CUDACachingAllocator` (#108907)

Commit

pytorch
2d26364f - [caffe2][cuda] Fix instrumentation of malloc/free SDTs for `CUDACachingAllocator` (#108907)