DeepSpeed
374f6d09 - Fix AutoTP test numerical tolerance with rtol (#7794)

Commit
12 days ago
Fix AutoTP test numerical tolerance with rtol (#7794) Replace torch.allclose() with torch.testing.assert_close() and add rtol parameter for proper floating-point comparisons in testRowParallel and testColumnParallel tests. The tests were failing intermittently in CI because they only used absolute tolerance (atol) without relative tolerance. Adding rtol allows for proper numerical comparisons will improve stability of the tests. --------- Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>
Author
Parents
Loading