DeepSpeed
4862115b - Disable deterministic option in compile tests (#7720)

Commit
17 days ago
Disable deterministic option in compile tests (#7720) Compiler tests (with/without DeepCompile) occasionally fail with mismatching loss values: ``` FAILED tests/unit/v1/compile/test_compile_zero.py::TestZeRO::test_compile_zero[none-1-dtype0] AssertionError: Loss values are not close. Tensors are not close: actual=tensor(-0., device='cuda:1', dtype=torch.bfloat16, grad_fn=<DivBackward1>), expected=tensor(0.0255, device='cuda:1', dtype=torch.bfloat16, grad_fn=<CompiledFunctionBackward>) kwargs={'rtol': 0.5, 'atol': 0.01} ``` While the exact root cause is not yet clear, but we found a [similar issue](https://github.com/pytorch/pytorch/issues/159855) related to the compiler. This PR disables the deterministic option, which has improved stability. Previously, we encountered this error intermittently when running the compiler tests repeatedly. With this change, the tests now pass 100 consecutive runs. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Author
Parents
Loading