vllm
Fix for attention layers to remain unquantized during moe_wn16 quant
#12570
Merged

Fix for attention layers to remain unquantized during moe_wn16 quant #12570

srikanthsrnvs
srikanthsrnvs srikanthsrnvs requested a review from mgoin mgoin 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from robertgshaw2-redhat robertgshaw2-redhat 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from tlrmchlsmth tlrmchlsmth 1 year ago
github-actions
mgoin
mgoin approved these changes on 2025-01-31
mgoin mgoin added quantization
mgoin mgoin added ready
srikanthsrnvs Fix for attention layers to remain unquantized during moe_wn16 quant …
483b60c0
hmellor Set `?device={device}` when changing tab in installation guides (#12560)
915fdce8
Beim [Misc] fix typo: add missing space in lora adapter error message (#12…
d689505a
robertgshaw2-redhat [Kernel] Triton Configs for Fp8 Block Quantization (#11589)
689bd199
npanpaliya [CPU][PPC] Updated torch, torchvision, torchaudio dependencies (#12555)
f7a4e122
mgoin [V1][Log] Add max request concurrency log to V1 (#12569)
95b49be3
LucasWilkinson [Kernel] Update `cutlass_scaled_mm` to support 2d group (blockwise) s…
b0d72881
maleksan85 [ROCm][AMD][Model] llama 3.2 support upstreaming (#12421)
9813962a
LucasWilkinson [Attention] MLA decode optimizations (#12528)
897c8c24
ywang96 [Bugfix] Gracefully handle huggingface hub http error (#12571)
c4795ce0
mgoin Format
a5e6700c
hmellor Add favicon to docs (#12611)
1ce860be
robertgshaw2-redhat [BugFix] Fix Torch.Compile For DeepSeek (#12594)
bc9d8314
comaniac [Git] Automatically sign-off commits (#12595)
22b918de
comaniac [Docs][V1] Prefix caching design (#12598)
00df0e4b
heheda12345 [v1][Bugfix] Add extra_keys to block_hash for prefix caching (#12603)
44fa70d9
khluu [release] Add input step to ask for Release version (#12631)
fdd86fbb
robertgshaw2-redhat [Bugfix] Revert MoE Triton Config Default (#12629)
c4a7c261
tlrmchlsmth [Kernel][Quantization] Integrate block-quantized CUTLASS kernels for …
e7c98c61
xpbowler [Feature] Fix guided decoding blocking bitmask memcpy (#12563)
d27e55d2
hmellor [Doc] Improve installation signposting (#12575)
bece70b9
brian-dellabetta [Doc] int4 w4a16 example (#12585)
6b7e4331
robertgshaw2-redhat [V1] Bugfix: Validate Model Input Length (#12600)
fd9060b1
sleepwalker2017 [BugFix] fix wrong output when using lora and num_scheduler_steps=8 (…
8ae26746
eldarkurtic Fix target matching for fused layers with compressed-tensors (#12617)
19d375d6
khluu [ci] Upgrade transformers to 4.48.2 in CI dependencies (#12599)
64d11309
tlrmchlsmth [Bugfix/CI] Fixup benchmark_moe.py (#12562)
674ab715
rahul-tuli Fix: Respect `sparsity_config.ignore` in Cutlass Integration (#12517)
9a614434
LucasWilkinson [Attention] Deepseek v3 MLA support with FP8 compute (#12601)
bb942601
russellb [CI/Build] Add label automation for structured-output, speculative-de…
55727d05
simon-mo Disable chunked prefill and/or prefix caching when MLA is enabled (#…
4bad7108
mgoin Apply torch.compile to fused_moe/grouped_topk (#12637)
f292876e
vicenteherrera doc: fixing minor typo in readme.md (#12643)
0079b1c7
jinzhen-lin [Bugfix] fix moe_wna16 get_quant_method (#12648)
a4124cbb
russellb [Core] Silence unnecessary deprecation warnings (#12620)
8f1a0616
WoosukKwon [V1][Minor] Avoid frequently creating ConstantList (#12653)
f4e9f990
ShawnD200 [Core][v1] Unify allocating slots in prefill and decode in KV cache m…
f709c159
jikunshang [Hardware][Intel GPU] add XPU bf16 support (#12392)
87e5e8b8
russellb [Misc] Add SPDX-License-Identifier headers to python source files (#1…
b9895606
youkaichao [doc][misc] clarify VLLM_HOST_IP for multi-node inference (#12667)
d0cd67a7
srikanthsrnvs srikanthsrnvs force pushed to d0cd67a7 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from youkaichao youkaichao 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from alexm-redhat alexm-redhat 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from comaniac comaniac 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from simon-mo simon-mo 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from WoosukKwon WoosukKwon 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from njhill njhill 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from LiuXiaoxuanPKU LiuXiaoxuanPKU 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from KuntaiDu KuntaiDu 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from DarkLight1337 DarkLight1337 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from ywang96 ywang96 1 year ago
srikanthsrnvs srikanthsrnvs requested a review from zhuohan123 zhuohan123 1 year ago
mergify mergify added documentation
mergify mergify added ci/build
mergify mergify added frontend
mergify mergify added structured-output
mergify mergify added speculative-decoding
mergify
mergify mergify added v1
mergify mergify added needs-rebase
srikanthsrnvs Merge branch 'main' into fix-moe-wna16-attention
8b5a0ea1
mergify mergify removed needs-rebase
srikanthsrnvs unused imports
9d09ec0d
DarkLight1337 DarkLight1337 enabled auto-merge (squash) 1 year ago
srikanthsrnvs
DarkLight1337
disabled auto-merge 1 year ago
Manually disabled by user
youkaichao youkaichao merged b9986454 into main 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone