[webgpu] Correct definition of large numbers, fixes softmax(max_negative_number) in float32 (#26670)
### Description
The correct definition of the most negative number is
`-3.40282346638528e+38`, according to IEEE 754, but it is being
incorrectly registered inline as a truncated version `-3.402823e+38f`.
```py
>>> import numpy as np
>>> np.finfo(np.float32).min
np.float32(-3.4028235e+38)
>>> np.finfo(np.float32).min.item()
-3.4028234663852886e+38
```
For this reason, values less than this threshold were handled
incorrectly. While this may seem like a small/irrelevant detail, it's
essential in attention masking, where we do in fact use this value,
leading to large numerical errors down the line.
Reproduction:
```py
from onnx import helper, TensorProto
import onnxruntime as ort
import numpy as np
# 1. Create the ONNX model
# Define input and output
input_shape = [1, 2]
input_info = helper.make_tensor_value_info('X', TensorProto.FLOAT, input_shape)
output_info = helper.make_tensor_value_info('Y', TensorProto.FLOAT, input_shape)
# Create the Softmax node
# Softmax takes one input: X
softmax_node = helper.make_node(
'Softmax',
inputs=['X'],
outputs=['Y'],
name='SoftmaxNode',
axis=-1 # Default axis is -1, usually applied to the last dimension
)
# Create the graph
graph_def = helper.make_graph(
[softmax_node],
'test-model',
[input_info],
[output_info]
)
# Create the model
model_def = helper.make_model(graph_def, producer_name='onnx-example')
opset = model_def.opset_import[0]
opset.version = 13 # Ensure opset version supports the operations
# 2. Convert model to string (bytes)
model_str = model_def.SerializeToString()
# 3. Prepare input data
np.random.seed(0)
input_data = np.array(
[[-3.40282346638528e+38, -3.40282346638528e+38]]
# [[-3.4028234663852886e+38, -3.4028234663852886e+38]]
).astype(np.float32)
print(input_data.tolist())
# 4. Run on CPUExecutionProvider
sess_cpu = ort.InferenceSession(model_str, providers=['CPUExecutionProvider'])
res_cpu = sess_cpu.run(['Y'], {'X': input_data})[0]
print("CPU Result:", res_cpu)
# 5. Run on WebGpuExecutionProvider
sess_webgpu = ort.InferenceSession(model_str, providers=['WebGpuExecutionProvider'])
res_webgpu = sess_webgpu.run(['Y'], {'X': input_data})[0]
print("WebGPU Result:", res_webgpu)
# Compare results
diff = np.abs(res_cpu - res_webgpu)
max_diff = diff.max().item()
print(diff)
print(f"Max diff: {max_diff}")
assert max_diff < 1e-5, f"Results do not match within tolerance! Max diff: {max_diff}"
print("Results match!")
```
Before:
```
[[-3.4028234663852886e+38, -3.4028234663852886e+38]]
CPU Result: [[0.5 0.5]]
WebGPU Result: [[0. 0.]]
[[0.5 0.5]]
Max diff: 0.5
AssertionError: Results do not match within tolerance! Max diff: 0.5
```
After:
```
[[-3.4028234663852886e+38, -3.4028234663852886e+38]]
CPU Result: [[0.5 0.5]]
WebGPU Result: [[0.5 0.5]]
[[0. 0.]]
Max diff: 0.0
Results match!
```
cc @guschmue