llama.cpp
82677a6e - ggml-webgpu: compute pass batching and removing profiling overhead (#21873)

Commit

30 days ago

ggml-webgpu: compute pass batching and removing profiling overhead (#21873) * Update register tiling matmul to use f32 accumulation * fix profiling code * Fix register tiling matmul for chrome, i'm blaming dawn * Update batch tuning value for iOS * compile fix * Fix use of new load function * Move to a single query set for GPU profiling * Move to batching compute passes when not profiling * Refactor build_multi * remove iOS throttling now that we're batching compute passes

References

#21873 - ggml-webgpu: compute pass batching and removing profiling overhead

Author

reeselevine

Parents

8612ed18

llama.cpp 82677a6e - ggml-webgpu: compute pass batching and removing profiling overhead (#21873)

llama.cpp
82677a6e - ggml-webgpu: compute pass batching and removing profiling overhead (#21873)