support model_free WOQ quantization #1699
implement model free
dc592e99
polished implementation
177bf48b
remove useless gpu_concurrency
97e03620
添加预编译模式匹配器以提高量化过程中的性能和可扩展性
ff47a97a
fix typo
4d9ad0e5
update document
58709e64
remove useless code and update UT
d3951f26
mend
16991ea2
remove high_gpu_mem_usage since no performacen benefit.
83b9b4fe
update regex
687260db
fix bug and simplify UT
68d0cb7b
fix bug
312f75df
add WOQ limiation and support bits group_size setting
3ca4d3b5
Merge branch 'main' into xinhe/4-14
3f15e02d
[pre-commit.ci] auto fixes from pre-commit.com hooks
47b3f35d
update doc
76f99151
minor fix
c588ad22
enable quant_nontext_module
0c141653
Enhance model-free quantization support and improve documentation
405de53d
Merge remote-tracking branch 'origin/main' into xinhe/4-14
6c5ce29a
support loading pytorch_model.bin and ignore conv1d embed by creating…
0697324a
add UT to cover conv1d detection
f4fc5f41
support MXFP4/8 dequantization
4f6f97e4
Merge branch 'main' into xinhe/4-14
ed46cd68
fix pylint
7e3a3f87
Merge branch 'main' into xinhe/4-14
958191a5
add auto fallback and change class name
7440c321
fix CI
8b8d084e
update readme
eb5fdf43
添加回退压缩器功能以支持量化和保存
98a50401
Merge branch 'main' into xinhe/4-14
46465c39
support diffusion model
7c76188a
fix bug
a92acc2b
support layer_config={".ffn.experts.": {"scheme": "W2A16"}} usage
46ed32c4
fix bug
6f41cec5
update UT
9f81c67c
fix bug
16ead43b
Merge remote-tracking branch 'origin/main' into xinhe/4-14
48994a40
add model free for new arch
3d9812ca
[pre-commit.ci] auto fixes from pre-commit.com hooks
bd318616
Merge branch 'main' into xinhe/4-14
efd8753a
fix issue in comments
312eabef
unify cli content
dbdaf9f1
[pre-commit.ci] auto fixes from pre-commit.com hooks
6ba73d61
update per comments
bba56fe8
Merge branch 'main' into xinhe/4-14
fca22106
fix bug
22711c39
Merge branch 'main' into xinhe/4-14
7e537ef7
fix CI
7aac7952
remove breakpoint
1888f86c
xin3he
force pushed
from
f593cfa2
to
551e1ca5
42 days ago
xin3he
force pushed
from
551e1ca5
to
f9cd4c7d
42 days ago
add iters in init kwargs for new arch
f9cd4c7d
xin3he
merged
f0013f09
into main 40 days ago
xin3he
deleted the xinhe/4-14 branch 40 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub