Enables previously "slow" `gradgrad` checks on CUDA (#57802)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/57508
Earlier, a few CUDA `gradgrad` checks (see the list of ops below) were disabled because of them being too slow. There have been improvements (see https://github.com/pytorch/pytorch/issues/57508 for reference) and this PR aimed on:
1. Time taken by `gradgrad` checks on CUDA for the ops listed below.
2. Enabling the tests again if the times sound reasonable
Ops considered: `addbmm, baddbmm, bmm, cholesky, symeig, inverse, linalg.cholesky, linalg.cholesky_ex, linalg.eigh, linalg.qr, lu, qr, solve, triangular_solve, linalg.pinv, svd, linalg.svd, pinverse, linalg.householder_product, linalg.solve`.
For numbers (on time taken) on a separate CI run: https://github.com/pytorch/pytorch/pull/57802#issuecomment-836169691.
cc: mruberry albanD pmeier
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57802
Reviewed By: ngimel
Differential Revision: D28784106
Pulled By: mruberry
fbshipit-source-id: 9b15238319f143c59f83d500e831d66d98542ff8