Makes CUDA -float->uint8 cast consistent with CPU (#36832)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/36807. Also updates the cast testing to catch issues like this better.
In the future a more constexpr based approach to casting would be nice.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36832
Differential Revision: D21120822
Pulled By: mruberry
fbshipit-source-id: 9504ddd36cfe6d9f9f545fc277fef36855c1b221