openvino
npuw: add i4/u4 kvcache copy support without ROI tensor slicing
#35255
Closed
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
23
Changes
View On
GitHub
npuw: add i4/u4 kvcache copy support without ROI tensor slicing
#35255
esmirno
wants to merge 23 commits into
openvinotoolkit:master
from
esmirno:es/kv-cache-i4-copy-support
WIP: dynamic quantize kv-cache support
0685abb6
Merge branch 'master' into es/kv-cache-compression-i8
be92548c
find_sda_nodes moved to util file
6e399d83
fixed prefill_chunking case
6a73a923
fixed inference on NPUW_CPU, added unit test for decompositions
75bbbb32
adjusted distribution seen in real kv-cache to be gen-gausse
e65291a1
unit test extended to run optionally on devices like CPU, NPU
60166a1a
Merge branch 'master' into es/kv-cache-compression-i8
b37c8a44
added u8 quantisation type for handlin DynamicQuantize decomposition …
9e14ac18
introduced i4 quantisation fo kv-cache for default u8/i8 case will us…
cbd2a7ef
Merge branch 'master' into es/kv-cache-compression-i8
2f9c6347
rebase remained integration to new source file
fdd4ec7d
build fixed
28566ddd
fixed cb4-fp8 feature dueto m_cfg late initialisation
31f745d0
Merge branch 'master' into es/kv-cache-compression-i8
2b3cbeee
clang-format-fixes
37f4ee7c
comments optimized
b8222496
sdpa pattern nodes tests updated according to review
c3221ca9
copy paste code simplified
cf5d5a37
comments updated
b89bad86
simplified according to review
6700876f
switched to i8/sym for value-cache storage i4-not working
b638fce5
github-actions
added
category: NPU
github-actions
added
category: NPUW
npuw: add i4/u4 kvcache copy support without ROI tensor slicing
35f4d7e2
esmirno
force pushed
from
89dc104d
to
35f4d7e2
70 days ago
github-actions
added
category: build
esmirno
closed this
70 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
No reviews
Assignees
No one assigned
Labels
category: build
category: NPU
category: NPUW
Milestone
No milestone
Login to write a write a comment.
Login via GitHub