llama.cpp
metal : make the backend async v2
#15906
Merged

Commits
  • metal : make the backend async
    ggerganov committed 141 days ago
  • cont : add comments, extend op offload, clean up
    ggerganov committed 141 days ago
  • metal : fix batch size for MUL_MAT_ID
    ggerganov committed 141 days ago
  • metal : remove deprecated ggml_backend_metal_buffer_from_ptr
    ggerganov committed 141 days ago
  • metal : create only metal buffers, no wrapping of host memory
    ggerganov committed 140 days ago
  • metal : restore .alloc_buffer for buffer_from_ptr_type
    ggerganov committed 140 days ago
  • metal : remove broken implementation of GGML_OP_SET
    ggerganov committed 140 days ago
  • metal : clean-up loose ends, ready for tests
    ggerganov committed 140 days ago
  • metal : support both private and shared buffers
    ggerganov committed 140 days ago
  • metal : enable private buffers + add global device queue
    ggerganov committed 140 days ago
  • metal : disable host buffer to prevent races
    ggerganov committed 140 days ago
  • metal : avoid extra copy during set_tensor
    ggerganov committed 140 days ago
  • metal : use separate buffer types for shread and private Metal buffers
    ggerganov committed 140 days ago
  • metal : simplify synchronization logic
    ggerganov committed 140 days ago
  • metal : fix build
    ggerganov committed 139 days ago
  • metal : do not implement cpy_tensor
    ggerganov committed 139 days ago
  • metal : separate implementations for shared and private buffers
    ggerganov committed 139 days ago
Loading