Fix FP8 (FLOAT8E4M3FN) quantization scale using wrong reference distribution (#29350)
## Problem
`compute_scale_zp_float8` (in
`onnxruntime/python/tools/quantization/quant_utils.py`) computes the FP8
quantization scale as `scale = std_data / std_f8`, where `std_f8` is the
standard deviation of the representable `FLOAT8E4M3FN` values. It built
that reference distribution as:
```python
all_values = [float(i) for i in range(256)]
```
That's the integers `0.0 .. 255.0` — **not** the float8 values. It
should reinterpret each of the 256 byte patterns as a `float8_e4m3fn`
value (the finite set spanning `-448..448`). This is a regression from
the ONNX 1.19 integration that removed
`onnx.numpy_helper.float8e4m3_to_float32` (the prior code was
`[float8e4m3_to_float32(i) for i in range(256)]`); the repo's own
reference notebook `docs/python/notebooks/quantization_f8.ipynb` still
documents the correct algorithm.
Effect: `std_f8` is computed as **73.90** instead of **100.06**, so
every FP8 scale is **~35% too large**, degrading FP8-quantized model
accuracy. The path is live — called from `onnx_quantizer.py` and
`qdq_quantizer.py`.
## Reproduction (real function)
```python
compute_scale_zp_float8(TensorProto.FLOAT8E4M3FN, numpy.float32(1.0))
# before: scale = 0.01353175 (distribution = 0..255, n=256, std=73.90)
# after: scale = 0.00999423 (distribution = -448..448, n=254, std=100.06)
```
## Fix
```python
all_values = numpy.arange(256, dtype=numpy.uint8).view(float8_e4m3fn).astype(numpy.float32)
```
The existing `not numpy.isnan(f) and not numpy.isinf(f)` filter then
drops the 2 NaN byte patterns, leaving the 254 finite float8 values.
`float8_e4m3fn` and `numpy` are already imported.
## Test
Adds `test_compute_scale_zp_float8` to
`onnxruntime/test/python/quantization/test_quant_util.py` asserting
`scale == std / 100.0577` (and linearity in `std`). It fails on the old
code (`std_f8` 73.9) and passes after the fix.