Optimize ONNX Attention KV cache with ConcatNewToPast and add release-build kernel safety #27613
Use LaunchConcatNewToPastKV for KV cache update and add release-build…
e2453514
Address PR review feedback: clarify comments for memory safety and BS…
70ec5315
Merge branch 'main' into titaiwang/improve_present_kv_copy
d9895f09
titaiwangms
marked this pull request as ready for review 77 days ago
Add C++ test for Flash decode path with fp16 and bool attention mask
dc652efe
Address PR review: zero-init present buffer tail and harden tensorsca…
ff63a530
Address senior review feedback: revert tail-zeroing, fix bf16 type ma…
d0933406
Fix unused variable warning and remove duplicate include
2f707cb3
Use OrtToCudaType for native bf16 in decode path, simplify partial ma…
79e5ca9e
Clarify type aliases and comments per review feedback
22694c8c
Simplify circular modulo and fix NaN test tolerance
b3866121
titaiwangms
force pushed
from
a8d21b9f
to
9184039c
76 days ago
Fix circular clamping bug, validate KV prefix in test, fix comments
f3424dba
titaiwangms
force pushed
from
9184039c
to
f3424dba
76 days ago
Fix unused lambda capture and add multi-batch attention test
d72c5edc
tianleiwu
approved these changes
on 2026-03-13
titaiwangms
deleted the titaiwang/improve_present_kv_copy branch 75 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub