pytorch
0cf91894 - [inductor] Support using the 'stream' param in AOT mode (#105589)

Commit View On GitHub

Commit

1 year ago

[inductor] Support using the 'stream' param in AOT mode (#105589) Summary: When in AOT mode, make use of the existing stream param: - Pass through and use the stream param in the launchKernel helper function. - In non-AOT mode, assign the stream param in the caller and pass to launchKernel - Use a CUDAStreamGuard so all fallback ops execute on the stream - CUDAStreamGuard subsumes CUDAGuard in AOT mode since it sets both stream and device Test Plan: - Ran cpp_wrapper tests: pytest test/inductor/test_cpp_wrapper.py - Manually inspected cpp output from the alexnet benchmark: a) In AOT mode: ``` static inline void launchKernel( CUfunction func, int gridX, int gridY, int gridZ, int numWraps, int sharedMemBytes, cudaStream_t stream) { AT_CUDA_DRIVER_CHECK_OVERRIDE(cuLaunchKernel( func, gridX, gridY, gridZ, 32*numWraps, 1, 1, sharedMemBytes, stream, args, nullptr)); ... at::cuda::CUDAStreamGuard stream_guard(at::cuda::getStreamFromExternal(stream, 0)); ... launchKernel(triton_poi_fused_convolution_0, 1, 784, 1, 4, 4352, kernel_args_var_0, stream); ... ``` b) Regular cpp wrapper: ``` ... at::cuda::CUDAGuard device_guard(0); cudaStream_t stream0 = at::cuda::getCurrentCUDAStream(0); ... launchKernel(triton_poi_fused_convolution_0, 1, 784, 1, 4, 4352, kernel_args_var_0, stream0); ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/105589 Approved by: https://github.com/desertfire

Author

masnesral

Committer

pytorchmergebot

Parents

03512477

pytorch 0cf91894 - [inductor] Support using the 'stream' param in AOT mode (#105589)

Commit

pytorch
0cf91894 - [inductor] Support using the 'stream' param in AOT mode (#105589)