onnxruntime
7880342e - Add numeric_limits for MLFloat16 and BFloat16 (#22197)

Commit

1 year ago

Add numeric_limits for MLFloat16 and BFloat16 (#22197) ### Description * Add std::numeric_limits for MLFloat16 and BFloat16. * Update some comments in csharp ORTFloat16.shared.cs. * Add unit tests (including Clip) Note that the canonical NaN is not consistent in C++ and C#. C# uses negative quiet NaN as canonical NaN, while C++ uses positive quiet NaN. The choice of CSharp Float16.NaN is to be consistent with System.Half.NaN. FP16 data returns from CUDA might have 7FFF as NaN; FP16 data from CPU provider might have 0x7E00 as NaN. Anyway there is no consistent canonical NaN in ORT right now. Because all these NaNs are aligned with IEEE spec, there shall not an issue in downstream. ### Motivation and Context std::numeric_limits is used in codebase but not defined for MLFloat16 and BFloat16. It causes some bugs like https://github.com/microsoft/onnxruntime/issues/21957 introduced by https://github.com/microsoft/onnxruntime/pull/21493.

References

#22197 - Add numeric_limits for MLFloat16 and BFloat16

Author

tianleiwu

Parents

72b0979e

onnxruntime 7880342e - Add numeric_limits for MLFloat16 and BFloat16 (#22197)

onnxruntime
7880342e - Add numeric_limits for MLFloat16 and BFloat16 (#22197)