benchmark
f6727dae - Raise XPU tolerances for bf16 ResNet & BotNet TorchBench (#170552)

Commit

95 days ago

Raise XPU tolerances for bf16 ResNet & BotNet TorchBench (#170552) Summary: Multiple TorchBench models on XPU fail accuracy tests due to numeric tolerance being too strict rather. Two contributing factors identified: 1. Measurement methodology change (PyTorch 2.6.0 enforcing cosine_similarity https://github.com/pytorch/pytorch/blob/main/benchmarks/dynamo/common.py#L2227) surfaced limitations and increased sensitivity in error checks for phlippe_resnet. 2. BatchNorm decomposition noise (~1e-5 RMSE per BN in fp16) accumulates through the iteration in botnet26t_256, pushing aggregate diffs beyond current thresholds. **Analysis** - phlippe_resnet failures reproduce across CPU and XPU; fp16 already uses higher tolerance, implying bf16 thresholds are misaligned. - Disabling BN decomposition brings botnet26t_256 outputs within tolerance; with decomposition enabled, cumulative numeric error is expected. - CI health indicates changes are non-disruptive; failures, where present, are unrelated to these PRs. Fixes https://github.com/intel/torch-xpu-ops/issues/1799 Fixes https://github.com/intel/torch-xpu-ops/issues/1305 X-link: https://github.com/pytorch/pytorch/pull/170552 Approved by: https://github.com/EikanWang, https://github.com/desertfire Reviewed By: seemethere Differential Revision: D89434646 fbshipit-source-id: e5ce062b497201158578abb1bdebaac4b593dbfd Co-authored-by: Tomasz Bohutyn <tbohutyn@habana.ai>

Author

generatedunixname499836121

Committer

meta-codesync[bot]

Parents

c65e4e72

benchmark f6727dae - Raise XPU tolerances for bf16 ResNet & BotNet TorchBench (#170552)

benchmark
f6727dae - Raise XPU tolerances for bf16 ResNet & BotNet TorchBench (#170552)