llama.cpp
59977eba - server: fix crash when batch > ubatch with embeddings (#17912)

Commit

204 days ago

server: fix crash when batch > ubatch with embeddings (#17912) * server: fix crash when batch > ubatch with embeddings (#12836) Fixes #12836 where the server crashes with GGML_ASSERT failure when running with embeddings enabled and n_batch > n_ubatch. Root cause: Embeddings use non-causal attention which requires all tokens to be processed within a single ubatch. When n_batch > n_ubatch, the server attempts to split processing, causing assertion failure. Solution: - Add parameter validation in main() after common_params_parse() - When embeddings enabled and n_batch > n_ubatch: * Log warnings explaining the issue * Automatically set n_batch = n_ubatch * Prevent server crash This follows the approach suggested by @ggerganov in issue #12836. Note: This supersedes stalled PR #12940 which attempted a runtime fix in the old examples/server/server.cpp location. This implementation validates at startup in tools/server/server.cpp (current location). Testing: - Build: Compiles successfully - Validation triggers: Warns when -b > -ub with --embedding - Auto-correction works: Adjusts n_batch = n_ubatch - No false positives: Valid params don't trigger warnings - Verified on macOS M3 Pro with embedding model * Update tools/server/server.cpp --------- Co-authored-by: ytian218 <ytian218@bloomberg.net> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

References

#17912 - server: fix crash when batch > ubatch with embeddings (#12836)

Author

yifant-code

Parents

79dbae03

llama.cpp 59977eba - server: fix crash when batch > ubatch with embeddings (#17912)

llama.cpp
59977eba - server: fix crash when batch > ubatch with embeddings (#17912)