Raise XPU tolerances for bf16 ResNet & BotNet TorchBench (#170552)
Summary:
Multiple TorchBench models on XPU fail accuracy tests due to numeric tolerance being too strict rather. Two contributing factors identified:
1. Measurement methodology change (PyTorch 2.6.0 enforcing cosine_similarity https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/common.py#L2227) surfaced limitations and increased sensitivity in error checks for phlippe_resnet.
2. BatchNorm decomposition noise (~1e-5 RMSE per BN in fp16) accumulates through the iteration in botnet26t_256, pushing aggregate diffs beyond current thresholds.
**Analysis**
- phlippe_resnet failures reproduce across CPU and XPU; fp16 already uses higher tolerance, implying bf16 thresholds are misaligned.
- Disabling BN decomposition brings botnet26t_256 outputs within tolerance; with decomposition enabled, cumulative numeric error is expected.
- CI health indicates changes are non-disruptive; failures, where present, are unrelated to these PRs.
Fixes https://github.com/intel/torch-xpu-ops/issues/1799
Fixes https://github.com/intel/torch-xpu-ops/issues/1305
X-link: https://github.com/pytorch/pytorch/pull/170552
Approved by: https://github.com/EikanWang, https://github.com/desertfire
Reviewed By: seemethere
Differential Revision: D89434646
fbshipit-source-id: e5ce062b497201158578abb1bdebaac4b593dbfd
Co-authored-by: Tomasz Bohutyn <tbohutyn@habana.ai>
Author
generatedunixname499836121