Add an extra cuBLAS version check.
cuBLAS has a known issue whereby many kernels free Tensor Memory multiple
times between CUDA versions 12.8 and 13.1 inclusive---making them unsafe to use
in scenarios involving several concurrent streams of compute kernels.
[tcgen05 instructions are available on the sm_100f, and sm_110f architecture
families](https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-instructions-tcgen05-alloc-dealloc-relinquish-alloc-permit).
PiperOrigin-RevId: 869167043