Change largeCUDATensorTest to largeTensorTest+onlyCUDA; add a buffer to large cuda tensor test (#45332)
Summary:
Effectively, `largeCUDATensorTest` = `largeTensorTest` + `onlyCUDA`.
There was this problem where a user got OOM for a `largeCUDATensorTest('16GB')` on a 16GB V100. This decorator was checking total memory for a GPU device, however in most cases, we can't allocate all of the memory that a GPU has. So, it would be beneficial that we have a buffer on this `largeTensorTest` check for CUDA. I added a 10% buffer to it.
Definition of `largeTensorTest`
https://github.com/pytorch/pytorch/blob/d22dd80128a0d1cfdbfc174a20271719b2f9c1e9/torch/testing/_internal/common_device_type.py#L560-L578
`_has_sufficient_memory`
https://github.com/pytorch/pytorch/blob/d22dd80128a0d1cfdbfc174a20271719b2f9c1e9/torch/testing/_internal/common_device_type.py#L535-L557
`largeCUDATensorTest`
https://github.com/pytorch/pytorch/blob/d22dd80128a0d1cfdbfc174a20271719b2f9c1e9/torch/testing/_internal/common_device_type.py#L526-L532
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45332
Reviewed By: ngimel
Differential Revision: D24698690
Pulled By: mruberry
fbshipit-source-id: a77544478e45ce271f6639ea04e87700574ae307