Add RAdam capturable API for forloop (#121260)
Implementation thanks to @MarouaneMaatouk in https://github.com/pytorch/pytorch/pull/118697, though I've since cleaned it up a lot to save perf on the rect < 5 eager case. It also just looks better now :) Added tests and the cudagraph health check.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121260
Approved by: https://github.com/mlazos