[CUBLAS][CUDA GRAPHS] Explicitly set the workspace for cuBLAS handles (#83461)
We're seeing an issue where repeatedly capturing graphs incurs increasing memory usage as cuBLAS internally allocates a new workspace for each graph even when the same handle is being used:
https://gist.github.com/tomconerlyanth/a20c04a4a46a0f6e9ce18f5280729b36
This PR works around the issue by intercepting the `CUBLAS_WORKSPACE_CONFIG` environment variable and allocating the workspace for the cuBLAS handle explicitly.
CC @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83461
Approved by: https://github.com/ngimel