benchmark
90e73750 - Unify `cuBLASLt` workspaces with `cuBLAS` workspaces (#145130)

Commit

308 days ago

Unify `cuBLASLt` workspaces with `cuBLAS` workspaces (#145130) Summary: As `cuBLAS` workspaces are already per-stream, there shouldn't be kernel execution overlap with `cuBLASLt` kernels. This PR reuses `cuBLAS` workspaces for `cuBLASLt` for the following benefits: + caching (`cuBLAS` workspaces were already cached, so now we get that for `cuBLASLt`) + "free" workspace size bump for `cuBLASLt` `cuBLASLt` workspace sizes were previously smaller than those for `cuBLAS` by default which potentially hurts performance, and we encountered difficulty in increasing the size due to downstream OOMs , see also #120925 + fixes behavior broken behavior with the memtracker; https://github.com/pytorch/pytorch/pull/139442 attempted to handle peaky allocation behavior that broke memtracker equivalence tests but it didn't seem to fully work, here the cached/reused `cuBLAS` workspace seems to fix it + one environment variable to rule them all: `CUBLAS_WORKSPACE_CONFIG` applies directly to `cuBLASLt` without a confusing `CUBLASLT_WORKSPACE_SIZE` that users would also need to consider X-link: https://github.com/pytorch/pytorch/pull/145130 Approved by: https://github.com/ngimel Reviewed By: izaitsevfb Differential Revision: D71711852 fbshipit-source-id: 4f57539b8f37f1f4c92a57c19276e84f81bffa23

Author

generatedunixname499836121

Committer

facebook-github-bot

Parents

10a7be34

benchmark 90e73750 - Unify `cuBLASLt` workspaces with `cuBLAS` workspaces (#145130)

benchmark
90e73750 - Unify `cuBLASLt` workspaces with `cuBLAS` workspaces (#145130)