[GDN] Eliminate GPU->CPU sync in prepare_chunk_indices during prefill #38361
claude
commented
on 2026-03-27
arpera
force pushed
55 days ago
arpera
force pushed
54 days ago
claude
commented
on 2026-03-28
claude
commented
on 2026-03-30
claude
commented
on 2026-03-30
[GDN] Eliminate GPU->CPU sync in prepare_chunk_indices during prefill
6386a1d5
Fix gemini-code issues: extract _insert helper in tensor_cache, add T…
8ef70dc1
Fix mypy: add type: ignore for dynamic register attribute
b21bb569
Extract hardcoded chunk_size=64 into FLA_CHUNK_SIZE constant
4e683f22
Fix: skip chunk_indices pre-registration on pure decode steps
a9a72482
[GDN] Pre-compute chunk_indices/chunk_offsets in metadata builder
83ceaad6
Remove dead register() code and duplicate prefill block
22c9779e
Fix Claude review comments: backend guard, kda chunk_size, BT simplif…
62dc2f51
Remove use_flashinfer backend guard for chunk_indices pre-computation
27608149
arpera
force pushed
to
27608149
51 days ago
Merge branch 'main' into artem/remove-extra-d2h-copy
dae956b2
Merge branch 'main' into artem/remove-extra-d2h-copy
c47aa23c
fix CI: lazy-import FLA ops to avoid CUDA init in forked subprocess
e1ab7a7b
arpera
force pushed
to
e1ab7a7b
48 days ago
Merge branch 'main' into artem/remove-extra-d2h-copy
1414fc8c
Merge branch 'main' into artem/remove-extra-d2h-copy
1a86ef61
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub