Add capturable single tensor Adamax (#121183)
Finishes the work started in https://github.com/pytorch/pytorch/pull/118697. Thanks @MarouaneMaatouk for the attempt, but due to inactivity I have opened this PR for Adamax. Note that the new capturable implementation is much simpler and I've modified the foreach capturable impl--it now calls fewer kernels and is more easily comparable to forloop.
Next steps:
* This PR discovered two bugs: #121178 and #121238.
* Move the now hefty graph optim tests in test_cuda to use OptimInfo.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121183
Approved by: https://github.com/albanD