vllm
Fix for attention layers to remain unquantized during moe_wn16 quant
#12570
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
42
Changes
View On
GitHub
Commits
Fix for attention layers to remain unquantized during moe_wn16 quant method
srikanthsrnvs
committed
1 year ago
Set `?device={device}` when changing tab in installation guides (#12560)
srikanthsrnvs
committed
1 year ago
[Misc] fix typo: add missing space in lora adapter error message (#12564)
srikanthsrnvs
committed
1 year ago
[Kernel] Triton Configs for Fp8 Block Quantization (#11589)
srikanthsrnvs
committed
1 year ago
[CPU][PPC] Updated torch, torchvision, torchaudio dependencies (#12555)
srikanthsrnvs
committed
1 year ago
[V1][Log] Add max request concurrency log to V1 (#12569)
srikanthsrnvs
committed
1 year ago
[Kernel] Update `cutlass_scaled_mm` to support 2d group (blockwise) scaling (#11868)
srikanthsrnvs
committed
1 year ago
[ROCm][AMD][Model] llama 3.2 support upstreaming (#12421)
srikanthsrnvs
committed
1 year ago
[Attention] MLA decode optimizations (#12528)
srikanthsrnvs
committed
1 year ago
[Bugfix] Gracefully handle huggingface hub http error (#12571)
srikanthsrnvs
committed
1 year ago
Format
srikanthsrnvs
committed
1 year ago
Add favicon to docs (#12611)
srikanthsrnvs
committed
1 year ago
[BugFix] Fix Torch.Compile For DeepSeek (#12594)
srikanthsrnvs
committed
1 year ago
[Git] Automatically sign-off commits (#12595)
srikanthsrnvs
committed
1 year ago
[Docs][V1] Prefix caching design (#12598)
srikanthsrnvs
committed
1 year ago
[v1][Bugfix] Add extra_keys to block_hash for prefix caching (#12603)
srikanthsrnvs
committed
1 year ago
[release] Add input step to ask for Release version (#12631)
srikanthsrnvs
committed
1 year ago
[Bugfix] Revert MoE Triton Config Default (#12629)
srikanthsrnvs
committed
1 year ago
[Kernel][Quantization] Integrate block-quantized CUTLASS kernels for DeepSeekV3 (#12587)
srikanthsrnvs
committed
1 year ago
[Feature] Fix guided decoding blocking bitmask memcpy (#12563)
srikanthsrnvs
committed
1 year ago
[Doc] Improve installation signposting (#12575)
srikanthsrnvs
committed
1 year ago
[Doc] int4 w4a16 example (#12585)
srikanthsrnvs
committed
1 year ago
[V1] Bugfix: Validate Model Input Length (#12600)
srikanthsrnvs
committed
1 year ago
[BugFix] fix wrong output when using lora and num_scheduler_steps=8 (#11161)
srikanthsrnvs
committed
1 year ago
Fix target matching for fused layers with compressed-tensors (#12617)
srikanthsrnvs
committed
1 year ago
[ci] Upgrade transformers to 4.48.2 in CI dependencies (#12599)
srikanthsrnvs
committed
1 year ago
[Bugfix/CI] Fixup benchmark_moe.py (#12562)
srikanthsrnvs
committed
1 year ago
Fix: Respect `sparsity_config.ignore` in Cutlass Integration (#12517)
srikanthsrnvs
committed
1 year ago
[Attention] Deepseek v3 MLA support with FP8 compute (#12601)
srikanthsrnvs
committed
1 year ago
[CI/Build] Add label automation for structured-output, speculative-decoding, v1 (#12280)
srikanthsrnvs
committed
1 year ago
Disable chunked prefill and/or prefix caching when MLA is enabled (#12642)
srikanthsrnvs
committed
1 year ago
Apply torch.compile to fused_moe/grouped_topk (#12637)
srikanthsrnvs
committed
1 year ago
doc: fixing minor typo in readme.md (#12643)
srikanthsrnvs
committed
1 year ago
[Bugfix] fix moe_wna16 get_quant_method (#12648)
srikanthsrnvs
committed
1 year ago
[Core] Silence unnecessary deprecation warnings (#12620)
srikanthsrnvs
committed
1 year ago
[V1][Minor] Avoid frequently creating ConstantList (#12653)
srikanthsrnvs
committed
1 year ago
[Core][v1] Unify allocating slots in prefill and decode in KV cache manager (#12608)
srikanthsrnvs
committed
1 year ago
[Hardware][Intel GPU] add XPU bf16 support (#12392)
srikanthsrnvs
committed
1 year ago
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
srikanthsrnvs
committed
1 year ago
[doc][misc] clarify VLLM_HOST_IP for multi-node inference (#12667)
srikanthsrnvs
committed
1 year ago
Merge branch 'main' into fix-moe-wna16-attention
srikanthsrnvs
committed
1 year ago
unused imports
srikanthsrnvs
committed
1 year ago
Loading