vllm
[v1][core] Support for attention free models
#20811
Merged

Commits
  • Support for attention free models
    christian-pinto committed 163 days ago
  • is_kv_cache_type_attention_free: return False if not attention free
    christian-pinto committed 163 days ago
  • some minor edits after first review round
    christian-pinto committed 163 days ago
  • Rebase to current master
    christian-pinto committed 163 days ago
  • Make pre-commits pass
    christian-pinto committed 163 days ago
  • Disable chunk prefill and prefix caching when model is attention free
    christian-pinto committed 163 days ago
  • reworked to allow for models like mamba to use the kv_cache for state retention
    christian-pinto committed 162 days ago
  • cleanup config.py
    christian-pinto committed 162 days ago
  • cleanup gpu_worker.py
    christian-pinto committed 162 days ago
  • Edits after review
    christian-pinto committed 162 days ago
Loading