PR #1703 Continuously optimize AutoScheme RAM consumption

Continuously optimize AutoScheme RAM consumption #1703

lvliang-intel wants to merge 45 commits into main from lvl/autoscheme_ram_opt

Continue optimizing AutoScheme RAM consumption

ee195230

lvliang-intel requested a review from

copilot-pull-request-reviewer 26 days ago

[pre-commit.ci] auto fixes from pre-commit.com hooks

f19224e5

copilot-pull-request-reviewer commented on 2026-04-17

fix: add missing run_mllm entry point alias (#1695)

f0d183c7

rename scheme INT8_W8A8 to INT8 (#1687)

fe0a5411

update mtp quant for special cases (#1691)

3a9575f6

Update gaudi-docker to v1.24.0 & fix CUDA UT (#1708)

68396c0c

add support for gemma4 model (#1655)

cae2d806

ignore mtp.fc for qwen3_5 due to vllm failure (#1710)

7f65d035

[research feature] Introduce INT4 support at the algorithm level (#1641)

59f36390

refine int4 doc (#1720)

dd52cd2d

Support new model Qwen/Qwen3.6-35B-A3B (#1705)

8073fa7c

Revert "ignore mtp.fc for qwen3_5 due to vllm failure (#1710)" (#1730)

318b3b37

skip quantizing mtp.fc since vLLM doesn't support (#1731)

507f3ef7

Update pull_request_template.md (#1727)

d8d332ac

Create model_support_request.yml (#1738)

107485df

fix gemma3 gguf ut fail (#1735)

69cae588

Remove threaded packing from exporters (#1719)

1643ce1a

add small zimage test and fix bug (#1734)

26c75743

Enhance llmc CI on XPU (#1483)

8bced5f7

Reduce xpu memory usage with patch_xpu_sdpa_drop_causal_mask (#1716)

4c2238fd

[Experimental]Add MLX format export support and AutoScheme for vlm …

145847b1

add warnings for lm_head activation scale fallback (#1728)

cc66be71

add support for MiMo-V2-Flash (#1718)

a4f9bf9f

New architecture for auto_round (#1542)

38ef9463

Fix vllm CUDA CI (#1750)

c3690709

delete unreproduced results for now (#1760)

d9e0f6ad

Fix hpu error (#1766)

9324bdf2

[MTP]split gate_up_proj and fix accu gap in rtn quantization (#1758)

4d991746

clean and fix for new arch (#1761)

74594eb1

support gptqmodel 7.0.0 and fix bug in CI (#1772)

66ed80da

Optimize CUDA CI and Code Scan workflows (#1770)

f5189565

fix accuracy regression and check it in CUDA CI (#1785)

85733088

fix amp (#1768)

2b475833

fix amp (#1767)

75325d23

Fix incompatible weight names (#1759)

a97e3342

add notes (#1795)

1295774e

remove IPEX related code, doc, and test (#1787)

4c77a982

support model_free WOQ quantization (#1699)

a7d01a27

Integrate AutoRound Lib (#1723)

82a7b99b

fix new arch bug for llmc (#1781)

bd935e4a

fix bug of gguf alg ext (#1796)

330bd78e

Continue optimizing AutoScheme RAM consumption

976f90db

[pre-commit.ci] auto fixes from pre-commit.com hooks

a15b8250

github-advanced-security commented on 2026-05-12

Merge branch 'main' into lvl/autoscheme_ram_opt

410a4e41

Merge branch 'main' into lvl/autoscheme_ram_opt

07e784b7

Reviewers

github-advanced-security

copilot-pull-request-reviewer

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

auto-round Continuously optimize AutoScheme RAM consumption #1703 Open

Continuously optimize AutoScheme RAM consumption #1703

auto-round
Continuously optimize AutoScheme RAM consumption
#1703

Open