vllm
[v1][core] Support for attention free models
#20811
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
10
Changes
View On
GitHub
Commits
Support for attention free models
christian-pinto
committed
163 days ago
is_kv_cache_type_attention_free: return False if not attention free
christian-pinto
committed
163 days ago
some minor edits after first review round
christian-pinto
committed
163 days ago
Rebase to current master
christian-pinto
committed
163 days ago
Make pre-commits pass
christian-pinto
committed
163 days ago
Disable chunk prefill and prefix caching when model is attention free
christian-pinto
committed
163 days ago
reworked to allow for models like mamba to use the kv_cache for state retention
christian-pinto
committed
162 days ago
cleanup config.py
christian-pinto
committed
162 days ago
cleanup gpu_worker.py
christian-pinto
committed
162 days ago
Edits after review
christian-pinto
committed
162 days ago
Loading