onnxruntime
Optimize ONNX Attention KV cache with ConcatNewToPast and add release-build kernel safety
#27613
Merged

Optimize ONNX Attention KV cache with ConcatNewToPast and add release-build kernel safety #27613

titaiwangms merged 12 commits into main from titaiwang/improve_present_kv_copy
titaiwangms
titaiwangms Use LaunchConcatNewToPastKV for KV cache update and add release-build…
e2453514
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 78 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-10
titaiwangms
titaiwangms commented on 2026-03-10
tianleiwu
titaiwangms Address PR review feedback: clarify comments for memory safety and BS…
70ec5315
titaiwangms Merge branch 'main' into titaiwang/improve_present_kv_copy
d9895f09
titaiwangms titaiwangms marked this pull request as ready for review 77 days ago
titaiwangms Add C++ test for Flash decode path with fp16 and bool attention mask
dc652efe
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 77 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-11
titaiwangms Address PR review: zero-init present buffer tail and harden tensorsca…
ff63a530
tianleiwu tianleiwu requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 77 days ago
tianleiwu
tianleiwu commented on 2026-03-11
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-11
tianleiwu
tianleiwu commented on 2026-03-11
titaiwangms Address senior review feedback: revert tail-zeroing, fix bf16 type ma…
d0933406
titaiwangms
titaiwangms commented on 2026-03-11
titaiwangms Fix unused variable warning and remove duplicate include
2f707cb3
titaiwangms Use OrtToCudaType for native bf16 in decode path, simplify partial ma…
79e5ca9e
titaiwangms Clarify type aliases and comments per review feedback
22694c8c
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 76 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-12
titaiwangms Simplify circular modulo and fix NaN test tolerance
b3866121
titaiwangms titaiwangms requested a review from tianleiwu tianleiwu 76 days ago
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 76 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-12
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 76 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-12
titaiwangms titaiwangms force pushed from a8d21b9f to 9184039c 76 days ago
titaiwangms titaiwangms requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 76 days ago
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-03-12
titaiwangms
titaiwangms Fix circular clamping bug, validate KV prefix in test, fix comments
f3424dba
titaiwangms titaiwangms force pushed from 9184039c to f3424dba 76 days ago
titaiwangms Fix unused lambda capture and add multi-batch attention test
d72c5edc
tianleiwu
tianleiwu approved these changes on 2026-03-13
titaiwangms titaiwangms merged 99e0119b into main 75 days ago
titaiwangms titaiwangms deleted the titaiwang/improve_present_kv_copy branch 75 days ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone