onnxruntime
95f515a9 - WebGPU QuantizeLinear: per-axis, blocked quantization, Option B packing

Commit

30 days ago

WebGPU QuantizeLinear: per-axis, blocked quantization, Option B packing Extend the WebGPU QuantizeLinear kernel to support per-axis and blocked quantization modes for int8/uint8 output types. Shader changes: - Add get_scale()/get_zero_point() with per-tensor/per-axis/blocked branches - Add get_blocked_scale_idx() for stride-based blocked index computation - Convert to Option B packing: each thread quantizes 4 elements and packs via pack4xI8 (no shared memory or workgroupBarrier) - Hoist scale/zero_point fetch for per-tensor mode - Fix clamp range for signed types (-128..127 vs 0..255) C++ changes: - Remove ORT_NOT_IMPLEMENTED for blocked quantization - Add blocked uniforms: block_size, norm_dim_on_axis, scale_dim_times_axis_stride - Register kernels for opsets 13-18, 19-20, 21+ Tests: add int8 per-tensor, per-axis, and blocked quantization tests with exact values

Author

edgchen1

Parents

65cf878e

onnxruntime 95f515a9 - WebGPU QuantizeLinear: per-axis, blocked quantization, Option B packing

onnxruntime
95f515a9 - WebGPU QuantizeLinear: per-axis, blocked quantization, Option B packing