llama.cpp
e4fed9d0 - ggml-webgpu: address quantization precision and backend lifecycle managment (#21521)

Commit

5 days ago

ggml-webgpu: address quantization precision and backend lifecycle managment (#21521) * ggml(webgpu): fix the busy-polls in Emscripten in the waitAny after #20618, and remove the busy webgpu log * Merge with upstream * Fix GET_ROWS packed integer NaN when using f16 as memory buffer in shader quants * Update Unary wgsl EXP and EXPM1 for f16 stability * Fix GET_ROWS IQ4_XS strcut for NaN f16 canonicalization * Fix numerical percision for unary sqrt when working with f16 * Fix NaN canonicalization for packed integers using f16 * Update err threshold for binary div ops when using f16 * backend: Keep one Dawn/WebGPU instance alive for the lifetime of the static backend * clean: uncomment existing code logs * clean: clean the unncessary debug info * Refactor and generalize dequant helpers * Remove deprecated quant structs * Refactor shader defines to reduce repetition * Remove error override for F16 type * fix: fix the accidential removal of the proper initialization of ctx * clean: clean legacy and format code * fix: did not modify tests ops --------- Co-authored-by: Jeremy J. Hartmann <jeremy@mtion.tv>

References

#21521 - ggml-webgpu: address quantization precision and backend lifecycle managment

Author

Constannnnnt

Parents

5dd10253

llama.cpp e4fed9d0 - ggml-webgpu: address quantization precision and backend lifecycle managment (#21521)

llama.cpp
e4fed9d0 - ggml-webgpu: address quantization precision and backend lifecycle managment (#21521)