intel/auto-round

Pull Requests Commits

remove force fp16 dtype export (#192)

WeiweiZhang1 committed 1 year ago

Verified d9f7ceee

Fix multimodal and moe issue (#191)

WeiweiZhang1 committed 1 year ago

Verified ca59d36a

low_cpu_mem refinement (#186)

n1ck-guo committed 1 year ago

Verified 7b9611ed

support autoround hpu format (#182)

yintong-lu committed 1 year ago

Verified 72f5ce80

add initial support of mxfp4 (#187)

wenhuach21 committed 1 year ago

Verified d48d0404

[pre-commit.ci] pre-commit autoupdate (#167)

pre-commit-ci[bot] committed 1 year ago

Verified 9226a893

fix typos (#185)

wenhuach21 committed 1 year ago

Verified 24b2e740

Add layer wise mode to save memory (#136)

n1ck-guo committed 1 year ago

Verified e2e33f03

enable llava & Qwen-VL multimodal model quantization (#165)

WeiweiZhang1 committed 1 year ago

Verified aeb9e408

Fix UT coverage report (#180)

XuehaoSun committed 1 year ago

Verified 8d08400f

fix autoround format mixed precision issue and refine gptq format code (#183)

wenhuach21 committed 1 year ago

Verified e2814996

Fix autoround format accuracy issue (#179)

wenhuach21 committed 1 year ago

Verified 0126180f

Add unit test (#173)

XuehaoSun committed 1 year ago

Verified 81624095

add initial support for activation quantization (#176)

wenhuach21 committed 1 year ago

Verified 5f67048c

speedup the tuning a little (#175)

wenhuach21 committed 1 year ago

Verified 473f474d

add chat template in calib tokenization (#171)

yintong-lu committed 1 year ago

Verified 735dfc9e

[Large impact]set the default nsamples to 128 and low_gpu_mem_usage to False (#174)

wenhuach21 committed 1 year ago

Verified ab614824

support marlin in auto_round format (#172)

wenhuach21 committed 1 year ago

Verified 2b1448d4

revert the gptq format code to fix the regression (#168)

wenhuach21 committed 1 year ago

Verified 5947e9c0

fix typos, update overview img (#166)

WeiweiZhang1 committed 1 year ago

Verified 8d5765ac

1 fix a bug in autoround format with the latest transformers 2 rename n_samples n_blocks to nsamples nblocks (#163)

wenhuach21 committed 1 year ago

Verified f9e7d79e

WeiweiZhang1 committed 1 year ago

Verified 31c566cc

fix bug and limit numpy version (#159)

yintong-lu committed 1 year ago

Verified 77320b0a

support calibration dataset concat (#147)

yintong-lu committed 1 year ago

Verified 75e3fde0

remove gpt ppl eval from lm-0.4.2 (#158)

wenhuach21 committed 1 year ago

Verified 77d6a886

fix bug at whole block is excluded from quantization (#156)

wenhuach21 committed 1 year ago

Verified edcec56e

auto round quantizer supports gptq kernel (#155)

wenhuach21 committed 1 year ago

Verified 9cae103d

fix qbits issue (#153)

wenhuach21 committed 1 year ago

Verified c313fa33

Qbits related log (#151)

zhewang1-intc committed 1 year ago

Verified 34274fb3

autoround_support_qbits_backend (#145)

zhewang1-intc committed 1 year ago

Verified dbdc4a39

Newer Older