[ATen][CUDA][CUBLAS] cublasLtMatmul increase workspace_size (#120925)
According to the [cuBLAS API Reference](https://docs.nvidia.com/cuda/cublas/index.html#cublassetworkspace) the recommended workspace size for Hopper is 32 MiB and for the rest architectures 4 MiB. This PR increases the workspace size accordingly. I am not aware of the recommended workspace size for HIP, that is why I am keeping it unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120925
Approved by: https://github.com/eqy, https://github.com/malfet