openvino
npuw: add i4/u4 kvcache copy support without ROI tensor slicing
#35255
Closed

npuw: add i4/u4 kvcache copy support without ROI tensor slicing #35255

esmirno
esmirno WIP: dynamic quantize kv-cache support
0685abb6
esmirno Merge branch 'master' into es/kv-cache-compression-i8
be92548c
esmirno find_sda_nodes moved to util file
6e399d83
esmirno fixed prefill_chunking case
6a73a923
esmirno fixed inference on NPUW_CPU, added unit test for decompositions
75bbbb32
esmirno adjusted distribution seen in real kv-cache to be gen-gausse
e65291a1
esmirno unit test extended to run optionally on devices like CPU, NPU
60166a1a
esmirno Merge branch 'master' into es/kv-cache-compression-i8
b37c8a44
esmirno added u8 quantisation type for handlin DynamicQuantize decomposition …
9e14ac18
esmirno introduced i4 quantisation fo kv-cache for default u8/i8 case will us…
cbd2a7ef
esmirno Merge branch 'master' into es/kv-cache-compression-i8
2f9c6347
esmirno rebase remained integration to new source file
fdd4ec7d
esmirno build fixed
28566ddd
esmirno fixed cb4-fp8 feature dueto m_cfg late initialisation
31f745d0
esmirno Merge branch 'master' into es/kv-cache-compression-i8
2b3cbeee
esmirno clang-format-fixes
37f4ee7c
esmirno comments optimized
b8222496
esmirno sdpa pattern nodes tests updated according to review
c3221ca9
esmirno copy paste code simplified
cf5d5a37
esmirno comments updated
b89bad86
esmirno simplified according to review
6700876f
esmirno switched to i8/sym for value-cache storage i4-not working
b638fce5
github-actions github-actions added category: NPU
github-actions github-actions added category: NPUW
esmirno npuw: add i4/u4 kvcache copy support without ROI tensor slicing
35f4d7e2
esmirno esmirno force pushed from 89dc104d to 35f4d7e2 70 days ago
github-actions github-actions added category: build
esmirno esmirno closed this 70 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone