[v1][core] Support for attention free models #20811
Support for attention free models
b764c9dd
is_kv_cache_type_attention_free: return False if not attention free
5825ba45
some minor edits after first review round
fc86350b
Rebase to current master
97c11e62
Make pre-commits pass
673aeb06
Disable chunk prefill and prefix caching when model is attention free
fb3ecfbc
reworked to allow for models like mamba to use the kv_cache for stateā¦
8e5dbee2
cleanup config.py
2ee7087c
cleanup gpu_worker.py
19a7d708
Edits after review
b8f355e8
heheda12345
changed the title [v1][core]Support for attention free models [v1][core] Support for attention free models 155 days ago
christian-pinto
deleted the attention_free_models_support branch 153 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub