onnxruntime
535f1f37 - [webgpu] Split large inputs into smaller buffers to bypass maxStorageBufferBindingSize limit (#25962)

Commit

273 days ago

[webgpu] Split large inputs into smaller buffers to bypass maxStorageBufferBindingSize limit (#25962) ### Description When an input is bigger than maxStorageBufferBindingSize, use multiple binding entries for it. We refine the implement for `getByOffset`/`setByOffset` so that let's say, `input_b` is 257MB, but maxStorageBufferBindingSize is 256MB, we can use `b.getByOffset(offset)` to get the correct element and no need to care about the different binding entry. Actually, it will generate shader code like this. ``` var<storage, read> input_b: array<vec4<u32>>; // [0, 256MB) of input_b var<storage, read> input_b1: array<vec4<u32>>; // [256MB, 257MB) of input_b ``` ### Motivation and Context QC's maxStorageBufferBindingSize is 256MB, which is not enough for phi-4 model. So for QC, we customized a new phi4 model which use `slice` op to split the big matrix. That means we need to keep two different phi4 model for different platform. ### For reviewers The core logic is located - Shader side: - `shader_helper.cc`. In shader, use more`@group(0) @binding(....` matched the actual buffer numbers. - `shader_variable.cc`. Implement `set_xxx_by_offset(global_offset, value)` and `get_xxx_by_offset(global_offset)` shader helper function, which will be used when using `setByOffset`/`getByOffset` and the input exceed the maxstoragebuffersize. - WebGPU API side: - `webgpu_context.cc`. In WebGPU API, use more group entry matched the actual buffer numbers.

References

#25962 - [webgpu] Split large inputs into smaller buffers to bypass maxStorageBufferBindingSize limit

Author

xiaofeihan1

Parents

a60c3073

onnxruntime 535f1f37 - [webgpu] Split large inputs into smaller buffers to bypass maxStorageBufferBindingSize limit (#25962)

onnxruntime
535f1f37 - [webgpu] Split large inputs into smaller buffers to bypass maxStorageBufferBindingSize limit (#25962)