DeepSpeed
5dea776a - Enhance query APIs for text generation (#4965)

Commit
1 year ago
Enhance query APIs for text generation (#4965) This PR was authored to improve efficiency using DeepSpeed-FastGen. DeepSpeed-FastGen queries states of KV cache very frequently. Thus, this PR adds an API for the query and improves the efficiency of some query APIs. This PR also allows skipping the schedulability check in case it is already checked. --------- Co-authored-by: Heyang Qin <heyangqin@microsoft.com> Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Author
Parents
Loading