llama.cpp
f161463a - metal : allow ops to run concurrently (#15929)

Commit

92 days ago

metal : allow ops to run concurrently (#15929) * metal : run graphs ops concurrently ggml-ci * cont : add flags for debugging and disabling concurrency ggml-ci * cont : refactor and handle fusing ggml-ci * cont : simplify - no need to use GPU address ggml-ci * cont : prepare mem ranges for reuse + add ggml-metal-common.cpp ggml-ci * cont : avoid redundant keywords in cpp [no ci] * metal : reorder graph for better concurrency ggml-ci * metal : fix race on mem pool buffers ggml-ci * cont : add env GGML_METAL_GRAPH_OPTIMIZE_DISABLE ggml-ci * cont : refactor, optimize, add comments ggml-ci * cont : refactor ggml-metal.m ggml-ci * minor : update logs [no ci]

References

#15929 - metal : allow ops to run concurrently

Author

ggerganov

Parents

84d7b2fc

llama.cpp f161463a - metal : allow ops to run concurrently (#15929)

llama.cpp
f161463a - metal : allow ops to run concurrently (#15929)