pytorch
34f7dc9e - [ONNX] Support op consistency error reproduction (#119512)

Commit View On GitHub

Commit

226 days ago

[ONNX] Support op consistency error reproduction (#119512) Fixes #119472 Introduce the debugging tool in onnxscript: https://github.com/microsoft/onnxscript/blob/main/onnxscript/tests/function_libs/torch_lib/error_reproduction.py This tool can help us quickly find the inputs leading to mismatched errors. NOTE: this produces `error_reports` folder where there are different markdown reports for each mismatched test cases. For example - CREATE_REPRODUCTION_REPORT=1 python -m pytest onnxscript/tests/function_libs/torch_lib/ops_test.py -k test_output_match_fft_fft_cpu_bool ### Summary The output of ONNX Runtime does not match that of PyTorch when executing test `test_fx_op_consistency.TestOnnxModelOutputConsistency_opset_version_18_model_type_TorchModelType.TORCH_NN_MODULECPU.test_output_match_fft_fft_cpu_bool`, `sample 3` in ONNX Script `TorchLib`. To recreate this report, use ```bash CREATE_REPRODUCTION_REPORT=1 python -m pytest onnxscript/tests/function_libs/torch_lib/ops_test.py -k test_output_match_fft_fft_cpu_bool ``` ### ONNX Model ``` < ir_version: 8, opset_import: ["pkg.onnxscript.torch_lib" : 1, "" : 18, "pkg.onnxscript.torch_lib.common" : 1], producer_name: "pytorch", producer_version: "2.2.0" > main_graph (bool[31] l_args_0_) => (float[31,2] _fft_r2c) <bool[31] l_args_0_, float[31] _to_copy, float[31,2] _fft_r2c> { _to_copy = Cast <to: int = 1> (l_args_0_) _val_2 = Constant <value: tensor = int64[1] {-1}> () _val_3 = Unsqueeze (_to_copy, _val_2) _val_4 = Constant <value: tensor = int64[1] {0}> () _val_5 = Unsqueeze (_val_3, _val_4) _val_6 = DFT <axis: int = 1, inverse: int = 0, onesided: int = 0> (_val_5) _val_7 = Constant <value: tensor = int64[1] {0}> () _val_8 = Squeeze (_val_6, _val_7) _fft_r2c = pkg.onnxscript.torch_lib._fftn_onnx_normalization <dims: ints = [0], forward: int = 1, normalization: int = 0> (_val_3, _val_8) } < domain: "pkg.onnxscript.torch_lib", opset_import: ["" : 18] > _fftn_onnx_normalization <normalization,forward,dims>(self, transformed) => (result_15) { self_shape = Shape (self) dims = Constant <value_ints: ints = @dims> () self_shape_subscripted = Gather <axis: int = 0> (self_shape, dims) total_sample_count = ReduceProd <keepdims: int = 0> (self_shape_subscripted) total_sample_count_0 = CastLike (total_sample_count, transformed) normalization = Constant <value_int: int = @normalization> () int64_1 = Constant <value: tensor = int64 int64_1 {1}> () cond = Equal (normalization, int64_1) result_15 = If (cond) <then_branch: graph = thenGraph_21 () => ( result_3) { forward = Constant <value_int: int = @forward> () forward_as_bool = Cast <to: int = 9> (forward) result_3 = If (forward_as_bool) <then_branch: graph = thenGraph_23 () => ( result) { tmp = Sqrt (total_sample_count_0) result = Div (transformed, tmp) }, else_branch: graph = elseGraph_23 () => ( result_2) { tmp_1 = Sqrt (total_sample_count_0) result_2 = Mul (transformed, tmp_1) }> }, else_branch: graph = elseGraph_21 () => ( result_14) { normalization_4 = Constant <value_int: int = @normalization> () int64_2 = Constant <value: tensor = int64 int64_2 {2}> () cond_5 = Equal (normalization_4, int64_2) result_14 = If (cond_5) <then_branch: graph = thenGraph_27 () => ( result_9) { forward_6 = Constant <value_int: int = @forward> () forward_6_as_bool = Cast <to: int = 9> (forward_6) result_9 = If (forward_6_as_bool) <then_branch: graph = thenGraph_29 () => ( result_7) { result_7 = Div (transformed, total_sample_count_0) }, else_branch: graph = elseGraph_29 () => ( result_8) { result_8 = Identity (transformed) }> }, else_branch: graph = elseGraph_27 () => ( result_13) { forward_10 = Constant <value_int: int = @forward> () forward_10_as_bool = Cast <to: int = 9> (forward_10) result_13 = If (forward_10_as_bool) <then_branch: graph = thenGraph_35 () => ( result_11) { result_11 = Identity (transformed) }, else_branch: graph = elseGraph_35 () => ( result_12) { result_12 = Mul (transformed, total_sample_count_0) }> }> }> } < domain: "pkg.onnxscript.torch_lib.common", opset_import: ["" : 18] > Rank (input) => (return_val) { tmp = Shape (input) return_val = Size (tmp) } < domain: "pkg.onnxscript.torch_lib.common", opset_import: ["" : 18] > IsScalar (input) => (return_val) { tmp = Shape (input) tmp_0 = Size (tmp) tmp_1 = Constant <value_int: int = 0> () return_val = Equal (tmp_0, tmp_1) } ``` ### Inputs Shapes: `['Tensor<torch.Size([31]), dtype=torch.bool>']` <details><summary>Details</summary> ```python kwargs = {} inputs = (tensor([False, False, True, True, False, True, False, True, False, False, True, False, False, False, False, False, True, True, True, True, True, True, True, True, False, False, False, False, True, True, True]),) ``` </details> ### Expected output Shape: `torch.Size([31, 2])` <details><summary>Details</summary> ```python expected = tensor([[16.0000, 0.0000], [-0.2369, 2.6590], [ 0.7336, -4.9670], [ 2.2093, 2.9865], [-0.7166, 1.0928], [-3.0614, 3.0015], [-1.8945, -0.9677], [-2.1538, 0.2513], [-2.2432, 1.3978], [-0.3429, 1.9494], [-0.6495, -1.5423], [-0.6005, 2.2398], [ 2.2639, 2.6430], [ 1.7609, 0.2033], [-1.3829, -2.3365], [-1.6854, -0.0311], [-1.6854, 0.0311], [-1.3829, 2.3365], [ 1.7609, -0.2033], [ 2.2639, -2.6430], [-0.6005, -2.2398], [-0.6495, 1.5423], [-0.3429, -1.9494], [-2.2432, -1.3978], [-2.1538, -0.2513], [-1.8945, 0.9677], [-3.0614, -3.0015], [-0.7166, -1.0928], [ 2.2093, -2.9865], [ 0.7336, 4.9670], [-0.2369, -2.6590]]) ``` </details> ### Actual output Shape: `torch.Size([31, 2])` <details><summary>Details</summary> ```python actual = tensor([[ 1.6000e+01, -9.1791e-06], [-2.3695e-01, 2.6590e+00], [ 7.3355e-01, -4.9670e+00], [ 2.2093e+00, 2.9865e+00], [-7.1663e-01, 1.0928e+00], [-3.0614e+00, 3.0015e+00], [-1.8946e+00, -9.6773e-01], [-2.1538e+00, 2.5126e-01], [-2.2432e+00, 1.3978e+00], [-3.4294e-01, 1.9494e+00], [-6.4946e-01, -1.5423e+00], [-6.0044e-01, 2.2398e+00], [ 2.2639e+00, 2.6430e+00], [ 1.7609e+00, 2.0326e-01], [-1.3829e+00, -2.3365e+00], [-1.6854e+00, -3.1130e-02], [-1.6854e+00, 3.1161e-02], [-1.3829e+00, 2.3365e+00], [ 1.7609e+00, -2.0327e-01], [ 2.2639e+00, -2.6430e+00], [-6.0047e-01, -2.2398e+00], [-6.4945e-01, 1.5423e+00], [-3.4294e-01, -1.9494e+00], [-2.2432e+00, -1.3978e+00], [-2.1538e+00, -2.5129e-01], [-1.8945e+00, 9.6773e-01], [-3.0615e+00, -3.0015e+00], [-7.1663e-01, -1.0928e+00], [ 2.2093e+00, -2.9865e+00], [ 7.3354e-01, 4.9670e+00], [-2.3695e-01, -2.6589e+00]]) ``` </details> ### Difference <details><summary>Details</summary> ```diff --- actual +++ expected @@ -1,31 +1,31 @@ -tensor([[ 1.6000e+01, -9.1791e-06], - [-2.3695e-01, 2.6590e+00], - [ 7.3355e-01, -4.9670e+00], - [ 2.2093e+00, 2.9865e+00], - [-7.1663e-01, 1.0928e+00], - [-3.0614e+00, 3.0015e+00], - [-1.8946e+00, -9.6773e-01], - [-2.1538e+00, 2.5126e-01], - [-2.2432e+00, 1.3978e+00], - [-3.4294e-01, 1.9494e+00], - [-6.4946e-01, -1.5423e+00], - [-6.0044e-01, 2.2398e+00], - [ 2.2639e+00, 2.6430e+00], - [ 1.7609e+00, 2.0326e-01], - [-1.3829e+00, -2.3365e+00], - [-1.6854e+00, -3.1130e-02], - [-1.6854e+00, 3.1161e-02], - [-1.3829e+00, 2.3365e+00], - [ 1.7609e+00, -2.0327e-01], - [ 2.2639e+00, -2.6430e+00], - [-6.0047e-01, -2.2398e+00], - [-6.4945e-01, 1.5423e+00], - [-3.4294e-01, -1.9494e+00], - [-2.2432e+00, -1.3978e+00], - [-2.1538e+00, -2.5129e-01], - [-1.8945e+00, 9.6773e-01], - [-3.0615e+00, -3.0015e+00], - [-7.1663e-01, -1.0928e+00], - [ 2.2093e+00, -2.9865e+00], - [ 7.3354e-01, 4.9670e+00], - [-2.3695e-01, -2.6589e+00]]) +tensor([[16.0000, 0.0000], + [-0.2369, 2.6590], + [ 0.7336, -4.9670], + [ 2.2093, 2.9865], + [-0.7166, 1.0928], + [-3.0614, 3.0015], + [-1.8945, -0.9677], + [-2.1538, 0.2513], + [-2.2432, 1.3978], + [-0.3429, 1.9494], + [-0.6495, -1.5423], + [-0.6005, 2.2398], + [ 2.2639, 2.6430], + [ 1.7609, 0.2033], + [-1.3829, -2.3365], + [-1.6854, -0.0311], + [-1.6854, 0.0311], + [-1.3829, 2.3365], + [ 1.7609, -0.2033], + [ 2.2639, -2.6430], + [-0.6005, -2.2398], + [-0.6495, 1.5423], + [-0.3429, -1.9494], + [-2.2432, -1.3978], + [-2.1538, -0.2513], + [-1.8945, 0.9677], + [-3.0614, -3.0015], + [-0.7166, -1.0928], + [ 2.2093, -2.9865], + [ 0.7336, 4.9670], + [-0.2369, -2.6590]]) ``` </details> ### Full error stack ``` Tensor-likes are not close! Mismatched elements: 21 / 62 (33.9%) Greatest absolute difference: 3.719329833984375e-05 at index (26, 1) (up to 1e-05 allowed) Greatest relative difference: 0.0005033136694692075 at index (15, 1) (up to 1.3e-06 allowed) File "/home/titaiwang/pytorch/test/onnx/test_fx_op_consistency.py", line 1763, in _compare_onnx_and_torch_exported_program torch.testing.assert_close( File "/home/titaiwang/pytorch/torch/testing/_comparison.py", line 1523, in assert_close raise error_metas[0].to_error(msg) ``` ### Environment ``` OS: Linux-5.15.135.1-2.cm2-x86_64-with-glibc2.35 Python version: 3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0] onnx==1.15.0 onnxruntime==1.17.0 onnxscript==0.1.0.dev20240207 numpy==1.26.0 torch==2.2.0a0+git684ce1b ``` Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/119512 Approved by: https://github.com/justinchuby, https://github.com/thiagocrepaldi

Author

titaiwangms

Committer

pytorchmergebot

Parents

bb287d73

pytorch 34f7dc9e - [ONNX] Support op consistency error reproduction (#119512)

Commit

pytorch
34f7dc9e - [ONNX] Support op consistency error reproduction (#119512)