Fix autoTP test numerical tolerance with assert_close
Replace torch.allclose() with torch.testing.assert_close() and add rtol
parameter for proper floating-point comparisons in testRowParallel and
testColumnParallel tests.
The tests were failing intermittently in CI because they only used absolute
tolerance (atol=1e-2) without relative tolerance. Adding rtol=1e-2 allows
for proper numerical comparisons where value magnitudes vary.
Also restore normal workflow execution (remove debug steps).
Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>