llama.cpp
9a5724de - ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535)

Commit
14 days ago
ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535) * ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH * makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32 * ggml: read GGML_OP_OFFLOAD_MIN_BATCH once and store to dev ctx * cann: forward declaration of device context struct * cann: move offload op check after device context declaration * cuda: fix whitespace Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>
Author
Parents
Loading