auto-round
Continuously optimize AutoScheme RAM consumption
#1703
Open

Continuously optimize AutoScheme RAM consumption #1703

lvliang-intel wants to merge 45 commits into main from lvl/autoscheme_ram_opt
lvliang-intel
lvliang-intel Continue optimizing AutoScheme RAM consumption
ee195230
lvliang-intel lvliang-intel requested a review from copilot-pull-request-reviewer copilot-pull-request-reviewer 26 days ago
pre-commit-ci[bot] [pre-commit.ci] auto fixes from pre-commit.com hooks
f19224e5
copilot-pull-request-reviewer
copilot-pull-request-reviewer commented on 2026-04-17
lvliang-intel
azure-pipelines
JGSphaela fix: add missing run_mllm entry point alias (#1695)
f0d183c7
thuang6 rename scheme INT8_W8A8 to INT8 (#1687)
fe0a5411
xin3he update mtp quant for special cases (#1691)
3a9575f6
XuehaoSun Update gaudi-docker to v1.24.0 & fix CUDA UT (#1708)
68396c0c
n1ck-guo add support for gemma4 model (#1655)
cae2d806
xin3he ignore mtp.fc for qwen3_5 due to vllm failure (#1710)
7f65d035
wenhuach21 [research feature] Introduce INT4 support at the algorithm level (#1641)
59f36390
wenhuach21 refine int4 doc (#1720)
dd52cd2d
lvliang-intel Support new model Qwen/Qwen3.6-35B-A3B (#1705)
8073fa7c
xin3he Revert "ignore mtp.fc for qwen3_5 due to vllm failure (#1710)" (#1730)
318b3b37
xin3he skip quantizing mtp.fc since vLLM doesn't support (#1731)
507f3ef7
xin3he Update pull_request_template.md (#1727)
d8d332ac
xin3he Create model_support_request.yml (#1738)
107485df
n1ck-guo fix gemma3 gguf ut fail (#1735)
69cae588
yiliu30 Remove threaded packing from exporters (#1719)
1643ce1a
xin3he add small zimage test and fix bug (#1734)
26c75743
chensuyue Enhance llmc CI on XPU (#1483)
8bced5f7
xin3he Reduce xpu memory usage with patch_xpu_sdpa_drop_causal_mask (#1716)
4c2238fd
wenhuach21 [Experimental]Add MLX format export support and AutoScheme for vlm …
145847b1
n1ck-guo add warnings for lm_head activation scale fallback (#1728)
cc66be71
n1ck-guo add support for MiMo-V2-Flash (#1718)
a4f9bf9f
n1ck-guo New architecture for auto_round (#1542)
38ef9463
XuehaoSun Fix vllm CUDA CI (#1750)
c3690709
ZaneMark delete unreproduced results for now (#1760)
d9e0f6ad
n1ck-guo Fix hpu error (#1766)
9324bdf2
xin3he [MTP]split gate_up_proj and fix accu gap in rtn quantization (#1758)
4d991746
n1ck-guo clean and fix for new arch (#1761)
74594eb1
xin3he support gptqmodel 7.0.0 and fix bug in CI (#1772)
66ed80da
XuehaoSun Optimize CUDA CI and Code Scan workflows (#1770)
f5189565
xin3he fix accuracy regression and check it in CUDA CI (#1785)
85733088
wenhuach21 fix amp (#1768)
2b475833
wenhuach21 fix amp (#1767)
75325d23
mengniwang95 Fix incompatible weight names (#1759)
a97e3342
wenhuach21 add notes (#1795)
1295774e
xin3he remove IPEX related code, doc, and test (#1787)
4c77a982
xin3he support model_free WOQ quantization (#1699)
a7d01a27
Zhenzhong1 Integrate AutoRound Lib (#1723)
82a7b99b
n1ck-guo fix new arch bug for llmc (#1781)
bd935e4a
n1ck-guo fix bug of gguf alg ext (#1796)
330bd78e
lvliang-intel Continue optimizing AutoScheme RAM consumption
976f90db
pre-commit-ci[bot] [pre-commit.ci] auto fixes from pre-commit.com hooks
a15b8250
github-advanced-security
github-advanced-security commented on 2026-05-12
lvliang-intel Merge branch 'main' into lvl/autoscheme_ram_opt
410a4e41
lvliang-intel Merge branch 'main' into lvl/autoscheme_ram_opt
07e784b7
lvliang-intel
azure-pipelines

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone