pytorch
bd6db542 - [CI] Mark mobilenet_v3_large as nondeterministic (#98314)

Commit View On GitHub

Commit

1 year ago

[CI] Mark mobilenet_v3_large as nondeterministic (#98314) Summary: Skip mobilenet_v3_large for accuracy checking to reduce noise on the dashboard. The root cause still needs to be investigated. mobilenet_v3_large shows random accuracy check failures with different error values from time to time, and here are some examples: ``` cuda train mobilenet_v3_large [2023-04-04 14:54:50,990] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.02172, (ref-fp64): 0.01068 and shape=torch.Size([960, 1, 5, 5]) [2023-04-04 14:54:50,990] torch._dynamo.utils: [ERROR] Accuracy failed for key name features.14.block.1.0.weight.grad ``` ``` cuda train mobilenet_v3_large [2023-04-04 14:57:59,972] torch._dynamo.utils: [ERROR] RMSE (res-fp64): 0.07744, (ref-fp64): 0.03073 and shape=torch.Size([72, 1, 5, 5]) [2023-04-04 14:57:59,973] torch._dynamo.utils: [ERROR] Accuracy failed for key name features.4.block.1.0.weight.grad ``` One observation is turnning off cudnn in the eager mode with `torch.backends.cudnn.enabled = False` makes the non-deterministic behvior go away but meanwhile it fails accuaracy checking consistently. Minifier didn't help to narrow down the error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/98314 Approved by: https://github.com/huydhn

Author

desertfire

Committer

pytorchmergebot

Parents

ecf08a0f

pytorch bd6db542 - [CI] Mark mobilenet_v3_large as nondeterministic (#98314)

Commit

pytorch
bd6db542 - [CI] Mark mobilenet_v3_large as nondeterministic (#98314)