onnxruntime
Optimize ONNX Attention KV cache with ConcatNewToPast and add release-build kernel safety
#27613

Merged

Optimize ONNX Attention KV cache with ConcatNewToPast and add release-build kernel safety #27613

titaiwangms merged 12 commits into main from titaiwang/improve_present_kv_copy

Use LaunchConcatNewToPastKV for KV cache update and add release-build…

e2453514

titaiwangms requested a review from

copilot-pull-request-reviewer 78 days ago

copilot-pull-request-reviewer commented on 2026-03-10

titaiwangms commented on 2026-03-10

Address PR review feedback: clarify comments for memory safety and BS…

70ec5315

Merge branch 'main' into titaiwang/improve_present_kv_copy

d9895f09

titaiwangms marked this pull request as ready for review 77 days ago

Add C++ test for Flash decode path with fp16 and bool attention mask

dc652efe

titaiwangms requested a review from

copilot-pull-request-reviewer 77 days ago

copilot-pull-request-reviewer commented on 2026-03-11

Address PR review: zero-init present buffer tail and harden tensorsca…

ff63a530

tianleiwu requested a review from

copilot-pull-request-reviewer 77 days ago

tianleiwu commented on 2026-03-11

copilot-pull-request-reviewer commented on 2026-03-11

tianleiwu commented on 2026-03-11

Address senior review feedback: revert tail-zeroing, fix bf16 type ma…

d0933406

titaiwangms commented on 2026-03-11

Fix unused variable warning and remove duplicate include

2f707cb3

Use OrtToCudaType for native bf16 in decode path, simplify partial ma…

79e5ca9e

Clarify type aliases and comments per review feedback

22694c8c

titaiwangms requested a review from

copilot-pull-request-reviewer 76 days ago

copilot-pull-request-reviewer commented on 2026-03-12

Simplify circular modulo and fix NaN test tolerance

b3866121

titaiwangms requested a review from

tianleiwu 76 days ago

titaiwangms requested a review from

copilot-pull-request-reviewer 76 days ago

copilot-pull-request-reviewer commented on 2026-03-12

titaiwangms requested a review from

copilot-pull-request-reviewer 76 days ago

copilot-pull-request-reviewer commented on 2026-03-12

titaiwangms force pushed from a8d21b9f to 9184039c 76 days ago

titaiwangms requested a review from

copilot-pull-request-reviewer 76 days ago

copilot-pull-request-reviewer commented on 2026-03-12

Fix circular clamping bug, validate KV prefix in test, fix comments

f3424dba

titaiwangms force pushed from 9184039c to f3424dba 76 days ago

Fix unused lambda capture and add multi-batch attention test

d72c5edc

tianleiwu approved these changes on 2026-03-13

titaiwangms merged 99e0119b into main 75 days ago

titaiwangms deleted the titaiwang/improve_present_kv_copy branch 75 days ago

Reviewers

tianleiwu

copilot-pull-request-reviewer

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

onnxruntime Optimize ONNX Attention KV cache with ConcatNewToPast and add release-build kernel safety #27613 Merged

Optimize ONNX Attention KV cache with ConcatNewToPast and add release-build kernel safety #27613

onnxruntime
Optimize ONNX Attention KV cache with ConcatNewToPast and add release-build kernel safety
#27613

Merged