fix resnet50_quantized_qat and mobilenet_v2_quantized_qat <> functionalization (#83339)
This won't actually fix the issue until we make FakeTensor always-on for AOTAutograd.
I confirmed with the following benchmark (with `normalize_ir=False` and `use_functionalize=True`) in the dynamo/functorch config (run inside the `torch dynamo` repo):
```
terminal...$ python benchmarks/torchbench.py --training --devices=cuda --accuracy-aot-nop --generate-aot-autograd-stats --use-eval-mode --isolate --only=mobilenet_v2_quantized_qat
cuda train mobilenet_v2_quantized_qat 0.967x p=0.00
terminal...$ python benchmarks/torchbench.py --training --devices=cuda --accuracy-aot-nop --generate-aot-autograd-stats --use-eval-mode --isolate --only=resnet50_quantized_qat
cuda train resnet50_quantized_qat 0.943x p=0.00
```
I explained a bit more in the comment: quantized models use a running-mean style op, `fused_moving_avg_obs_fake_quant`, that takes in the running min/max stored on the module and mutates them, potentially resizing them.
That causes `AOTAutograd` to complain: AOTAutograd first takes views of the inputs (using `.detach().requires_grad_(grad)`), and plumbs them through the function to figure out what output to trace the backward with. These new inputs now have `TensorImpl::allow_tensor_metadata_change_ = false`, which causes the op to fail when it tries to resize the running counter variables. Once we're always using fake tensors, we shouldn't need to use `.detach().requires_grad_()` anymore (since we already have fresh fake tensors to trace with).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83339
Approved by: https://github.com/ezyang