ggml webgpu: profiling, CI updates, reworking of command submission (#16452)
* Add profiling
* More detailed profiling
* Rework command submission to avoid global locks
* Update wait handling
* try new method of waiting on futures
* Add serializing of command submission in some cases
* Add new pool for timestamp queries and clean up logging
* Serialize command submission in CI and leave a TODO note
* Update webgpu CI
* Add myself as WebGPU codeowner
* Deadlock avoidance
* Leave WebGPU/Vulkan CI serialized
* Fix divide by 0
* Fix logic in division by inflight_threads
* Update CODEOWNERS and remove serialize submit option