[CUDA graphs] Clear autocast amp cache (#81558)
According to [autocast_mode.cpp](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/autocast_mode.cpp) `cached_casts` is to be cleared at the end of each forward pass. However, this was not the case in current implementation of `make_graphed_callables` so a graph created the following way:
```
with torch.cuda.amp.autocast(cache_enabled=True):
graphed_foo = torch.cuda.make_graphed_callables(foo, tensors)
```
Behaves incorrectly.
cc @ptrblck
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81558
Approved by: https://github.com/ngimel