Continuously optimize AutoScheme RAM consumption #1703
Continue optimizing AutoScheme RAM consumption
ee195230
[pre-commit.ci] auto fixes from pre-commit.com hooks
f19224e5
fix: add missing run_mllm entry point alias (#1695)
f0d183c7
rename scheme INT8_W8A8 to INT8 (#1687)
fe0a5411
update mtp quant for special cases (#1691)
3a9575f6
Update gaudi-docker to v1.24.0 & fix CUDA UT (#1708)
68396c0c
add support for gemma4 model (#1655)
cae2d806
ignore mtp.fc for qwen3_5 due to vllm failure (#1710)
7f65d035
[research feature] Introduce INT4 support at the algorithm level (#1641)
59f36390
refine int4 doc (#1720)
dd52cd2d
Support new model Qwen/Qwen3.6-35B-A3B (#1705)
8073fa7c
Revert "ignore mtp.fc for qwen3_5 due to vllm failure (#1710)" (#1730)
318b3b37
skip quantizing mtp.fc since vLLM doesn't support (#1731)
507f3ef7
Update pull_request_template.md (#1727)
d8d332ac
Create model_support_request.yml (#1738)
107485df
fix gemma3 gguf ut fail (#1735)
69cae588
Remove threaded packing from exporters (#1719)
1643ce1a
add small zimage test and fix bug (#1734)
26c75743
Enhance llmc CI on XPU (#1483)
8bced5f7
Reduce xpu memory usage with patch_xpu_sdpa_drop_causal_mask (#1716)
4c2238fd
[Experimental]Add MLX format export support and AutoScheme for vlm …
145847b1
add warnings for lm_head activation scale fallback (#1728)
cc66be71
add support for MiMo-V2-Flash (#1718)
a4f9bf9f
New architecture for auto_round (#1542)
38ef9463
Fix vllm CUDA CI (#1750)
c3690709
delete unreproduced results for now (#1760)
d9e0f6ad
Fix hpu error (#1766)
9324bdf2
[MTP]split gate_up_proj and fix accu gap in rtn quantization (#1758)
4d991746
clean and fix for new arch (#1761)
74594eb1
support gptqmodel 7.0.0 and fix bug in CI (#1772)
66ed80da
Optimize CUDA CI and Code Scan workflows (#1770)
f5189565
fix accuracy regression and check it in CUDA CI (#1785)
85733088
fix amp (#1768)
2b475833
fix amp (#1767)
75325d23
Fix incompatible weight names (#1759)
a97e3342
add notes (#1795)
1295774e
remove IPEX related code, doc, and test (#1787)
4c77a982
support model_free WOQ quantization (#1699)
a7d01a27
Integrate AutoRound Lib (#1723)
82a7b99b
fix new arch bug for llmc (#1781)
bd935e4a
fix bug of gguf alg ext (#1796)
330bd78e
Continue optimizing AutoScheme RAM consumption
976f90db
[pre-commit.ci] auto fixes from pre-commit.com hooks
a15b8250
Merge branch 'main' into lvl/autoscheme_ram_opt
410a4e41
Merge branch 'main' into lvl/autoscheme_ram_opt
07e784b7
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub