Increase tolerance for amp only poolformer_m36 (#145375)
Summary:
https://github.com/pytorch/pytorch/issues/144893
```
python benchmarks/dynamo/timm_models.py --only poolformer_m36 --accuracy --no-translation-validatio --training --amp --device cuda --backend inductor
```
`--float32`, `--bfloat16` - passes the accuracy
`--disable-cudagraph` does not change the result
accuracy_fail only for `--amp` and gives `0.048` res_error, on 1-element result Tensor.
This fails with `0.01` tolerance.
If to increase tolerance to 0.04 it passes. I have not reproduced "eager_two_runs_differ" on H100.
I think this is a true distribution of results with `--amp`, so increasing tolerance to 0.04 for ano case only makes it passing.
X-link: https://github.com/pytorch/pytorch/pull/145375
Approved by: https://github.com/desertfire
Reviewed By: izaitsevfb
Differential Revision: D68651235
fbshipit-source-id: 1753da5761471a5d759dbbaa4ec3f6e9e153a493