llama.cpp
9a5724de - ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535)

Commit

14 days ago

ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535) * ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH * makes the min_batch_size for triggering op offload configurable via env var, defaulting to the prior hardcoded value of 32 * ggml: read GGML_OP_OFFLOAD_MIN_BATCH once and store to dev ctx * cann: forward declaration of device context struct * cann: move offload op check after device context declaration * cuda: fix whitespace Co-authored-by: Aman Gupta <amangupta052@gmail.com> --------- Co-authored-by: Aman Gupta <amangupta052@gmail.com>

References

#18535 - ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH

Author

DocShotgun

Parents

9c142e3a

llama.cpp 9a5724de - ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535)

llama.cpp
9a5724de - ggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (#18535)