[Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 #32795
Add bfloat16 explicitly
6e06067b
MatthewBonanni
changed the title [Attention] Explicitly report support for kv_cache_dtype bfloat16 [Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 33 days ago
Update regular attention backends and base class
077fa118
Fix references to auto
dae74fe0
MatthewBonanni
marked this pull request as ready for review 33 days ago
Use is_quantized_kv_cache
dc7fce14
Merge branch 'main' into attn_bf16
4bfed7d8
Assignees
No one assigned
Labels
bug
ready
v1
cpu
nvidia
Login to write a write a comment.
Login via GitHub