Inductor freezing bfloat16 conv folding needs high tolerance (#145623)
Summary:
Issue:
https://github.com/pytorch/pytorch/issues/144888
Torchbench of timm lcnet_050 model fails on accuracy in case of `--frezing` `--inference` `--bfloat16`
`res_error==0.12`
If to turn off convolution inductor constant folding - `res_error==0.016`
`float16 error ~ 0.00669`
`float16 without conv folding ~ 0.0018`
convolution folding results in increase of error almost at one order of magnitude.
I think we should revisit and try to do something to improve the accuracy for conv folding.
E.g. For example doing conv folding at compilation time with float64?
At the moment I am adding counters to identify if convolution folding happened, and in case of bfloat16 and conv_folding - increase multiplier to the max level (10) to pass accuracy test.
X-link: https://github.com/pytorch/pytorch/pull/145623
Approved by: https://github.com/eellison
Reviewed By: ZainRizvi
Differential Revision: D68897700
fbshipit-source-id: f407528b4b37eb45273a8c66f791c44e86c6632e