vllm
[Model][MiniMaxText01] Support MiniMaxText01 model inference
#13454
Merged

[Model][MiniMaxText01] Support MiniMaxText01 model inference #13454

ZZBoom
github-actions
ZZBoom ZZBoom marked this pull request as draft 358 days ago
youkaichao youkaichao requested a review from tlrmchlsmth tlrmchlsmth 358 days ago
heheda12345
tlrmchlsmth
tlrmchlsmth commented on 2025-02-20
tlrmchlsmth
tlrmchlsmth commented on 2025-02-20
tlrmchlsmth
tlrmchlsmth commented on 2025-02-20
ZZBoom
zwc163
zifengdexiatian
mergify
mergify mergify added needs-rebase
ZZBoom
ZZBoom
tlrmchlsmth tlrmchlsmth assigned tlrmchlsmth tlrmchlsmth 350 days ago
zifengdexiatian
tlrmchlsmth
shuxiaobo
tugot17
mergify mergify added documentation
mergify mergify added ci/build
mergify mergify added frontend
mergify mergify added multi-modality
mergify mergify added structured-output
mergify mergify added speculative-decoding
mergify mergify added v1
qscqesze qscqesze force pushed to 1bd32bc8 335 days ago
mergify mergify removed needs-rebase
qscqesze qscqesze force pushed 335 days ago
qscqesze qscqesze force pushed 335 days ago
qscqesze qscqesze force pushed 335 days ago
qscqesze qscqesze force pushed 335 days ago
qscqesze qscqesze force pushed 335 days ago
qscqesze qscqesze force pushed 335 days ago
qscqesze [Config][HybridModel] Enhance layer determination logic for hybrid mo…
edddaf1c
qscqesze [Refactor][MiniMaxText] Update cache mapping reference in MiniMaxText…
1719721b
qscqesze [Refactor][AsyncLLM] Improve comments and clean up unused variables i…
d61b446e
qscqesze [Refactor][MiniMaxText] Clean up imports and improve code formatting …
faa8c6c2
qscqesze [Refactor][Config] Improve formatting and error handling in ModelConf…
7c65c038
qscqesze [Refactor][Config] Enhance layer counting logic in ModelConfig and im…
43f0152b
qscqesze [Refactor][Config] Improve formatting in VllmConfig for better readab…
d6e7798b
qscqesze [Refactor][LightningAttn] Update grid configuration in _attention fun…
5504867d
qscqesze [Refactor][LightningAttn] Simplify grid configuration in _attention f…
6c3f08b9
qscqesze [Refactor][MiniMaxText] Enhance forward method in MiniMaxText01 model…
fad01e8c
qscqesze [Refactor][MiniMaxText] Remove max_context_len parameter from MiniMax…
f6742797
qscqesze [Refactor][MiniMaxText] Update forward method parameters in MiniMaxTe…
cb7c074f
qscqesze [Refactor][MiniMaxText] Add context_lens_tensor and slot_mapping to A…
d48d3757
qscqesze [Refactor][MiniMaxText] Remove unnecessary property methods from Atte…
989b4886
qscqesze [Refactor][MiniMaxText] Simplify weight handling methods and improve …
0d4822d6
qscqesze [Refactor][MiniMaxText] Clean up and optimize weight handling and par…
b682944a
qscqesze [Refactor][MiniMaxText] Fix index handling in prefill loop of MiniMax…
9e9704af
qscqesze [Refactor][MiniMaxText] Streamline handling of attn_metadata in forwa…
5de6b1be
qscqesze [Refactor][MiniMaxText] Consolidate attn_metadata handling in MiniMax…
96c6dff5
qscqesze [Refactor][MiniMaxText] Remove kv_caches parameter from multiple meth…
fc6ab05b
qscqesze [Refactor][MiniMaxText] Enhance kv_cache handling in MiniMaxText01 mo…
a7f2e3a1
qscqesze [Refactor][MiniMaxText] Remove unused kv_caches parameter from _clear…
bc17ba96
qscqesze [Refactor][MiniMaxText] Initialize kv_cache in multiple classes of Mi…
2f873a95
qscqesze [Refactor][MiniMaxText] Remove kv_cache initialization from MiniMaxTe…
152b430a
qscqesze [Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
2e59aa7b
qscqesze [Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
9ff34fc0
qscqesze [Refactor][MiniMaxText] Remove redundant closing parenthesis in MiniM…
ed4ddcab
qscqesze [Refactor][MiniMaxText] Set default number of hidden layers to 8 in M…
4bee45bb
qscqesze [Refactor][MiniMaxText] Remove hardcoded number of hidden layers in M…
e4fd74e4
qscqesze [Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
495a39a4
qscqesze [Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
e0dec3ae
qscqesze [Refactor][MiniMaxText] Initialize MinimaxCacheManager in MiniMaxText…
8f9891f1
qscqesze [Refactor][MiniMaxText] Simplify kv_cache handling in MiniMaxText01 m…
1774c662
qscqesze [Refactor][MiniMaxText] Reorder parameters in forward method of MiniM…
88ec7c6f
qscqesze [Refactor][MiniMaxText] Add attn_metadata parameter to forward method…
2aa1c0de
qscqesze [Refactor][MiniMaxText] Remove kv_cache initialization in MiniMaxText…
37e7fec4
qscqesze [Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
f1c8fb63
qscqesze [Refactor][MiniMaxText] Correctly define NUM_FBLOCK as a constexpr in…
fc361d81
qscqesze [Refactor][LightningAttention] Improve code readability and consisten…
be625bf9
qscqesze [Refactor][MiniMaxText] Simplify forward method in MiniMaxText01 mode…
95bdd4a3
qscqesze [Refactor][MiniMaxText] Update kv_cache initialization in MiniMaxText…
5b619bbf
qscqesze [Refactor][LightningAttention] Enhance code readability in lightning_…
f46e997d
qscqesze [Refactor][LightningAttention] Optimize grid calculations in lightnin…
fce7caee
qscqesze qscqesze force pushed to fce7caee 334 days ago
qscqesze
ZZBoom ZZBoom marked this pull request as ready for review 334 days ago
tlrmchlsmth
tlrmchlsmth commented on 2025-03-17
qscqesze [Refactor][MiniMaxText] Remove unused weight2param_match and weight2p…
f16f818f
qscqesze [Refactor][MiniMaxText] Refactor layer initialization in MiniMaxText0…
09c9ceac
qscqesze qscqesze requested a review from WoosukKwon WoosukKwon 330 days ago
qscqesze [Update][SupportedModels] Add MiniMaxText01 model to the supported mo…
20d811a6
qscqesze [Refactor][MiniMaxText] Clean up formatting and improve readability i…
aea72dc5
qscqesze Merge remote-tracking branch 'origin/main' into qinggangying/vllm
925c02f4
qscqesze [Model] Refactor layer block type handling in ModelConfig for improve…
65c8274d
qscqesze Merge branch 'vllm-project:main' into qinggangying/vllm
01c5f9ea
qscqesze Merge branch 'vllm-project:main' into qinggangying/vllm
5a02fdff
qscqesze Refactor MiniMaxText01 model: import make_layers utility and initiali…
f0e54a76
qscqesze Enhance error handling in model execution: return None for None hidde…
61b3820b
qscqesze qscqesze requested a review from robertgshaw2-redhat robertgshaw2-redhat 327 days ago
qscqesze qscqesze requested a review from njhill njhill 327 days ago
qscqesze qscqesze requested a review from ywang96 ywang96 327 days ago
qscqesze qscqesze requested a review from comaniac comaniac 327 days ago
qscqesze qscqesze requested a review from alexm-redhat alexm-redhat 327 days ago
qscqesze Refactor MiniMaxText01 model: move None check for attn_metadata to af…
727b5728
tlrmchlsmth
tlrmchlsmth commented on 2025-03-20
qscqesze Refactor MiniMaxText01 model: replace direct access to attn_metadata.…
09d044bb
tlrmchlsmth
tlrmchlsmth commented on 2025-03-20
tlrmchlsmth
tlrmchlsmth commented on 2025-03-25
qscqesze Merge branch 'vllm-project:main' into qinggangying/vllm
01c008a9
qscqesze [Enhancement][Tests] Add comprehensive tests for lightning attention …
078a836d
qscqesze [Refactor][GPU] Simplify dummy run and sampler execution in GPU model…
42dc9b8f
qscqesze [Refactor][Tests] Clean up formatting and comments in lightning atten…
80052128
qscqesze [Refactor][Attention] Enhance kernel functions and parameter handling…
d30be904
qscqesze [Refactor][Attention] Improve clarity and structure in lightning atte…
c0581a3a
qscqesze [Refactor][Tests] Update decay calculation in linear decode forward test
4036f881
qscqesze [Refactor][Tests] Update decay handling in lightning attention tests
44d828b0
qscqesze [Refactor][Tests] Update lightning attention tests to skip incompatib…
25353a6f
qscqesze [Refactor][Tests] Enhance lightning attention tests to handle bfloat1…
61474928
qscqesze [Refactor][Tests] Remove bfloat16 handling and clean up lightning att…
75fcabc3
qscqesze [Refactor][Tests] Update decay tensor handling in lightning attention…
68d4549f
qscqesze [Refactor][Tests] Remove deprecated lightning attention tests
1fdb4cc7
qscqesze [Refactor][Tests] Expand data type support in lightning attention tests
358ba2de
qscqesze Fix variable name in lightning attention layer to correct tensor load…
703af1dd
qscqesze Refactor lightning attention integration in MiniMaxText01 model
e8d57248
qscqesze Add assertion for dimension divisibility in lightning attention
0c6a9043
qscqesze
tlrmchlsmth
tlrmchlsmth commented on 2025-03-27
qscqesze Add reference implementation for linear attention decoding in tests
8663e13a
rakshithvasudev
tlrmchlsmth
tlrmchlsmth commented on 2025-03-28
qscqesze Enhance lightning attention tests with reference implementation and a…
e61d6e32
qscqesze Fix typos and enhance data type consistency in lightning attention im…
57471b8f
qscqesze Enhance data type handling in linear decode function of lightning att…
ddabd28e
qscqesze Refactor linear attention decoding kernel for improved clarity and pe…
7f329964
qscqesze Refactor and enhance lightning attention tests for clarity and functi…
2ed7f2dc
qscqesze Refactor linear attention decoding kernel to improve efficiency and c…
1107317c
qscqesze Refactor linear attention decoding kernel and tests for improved clar…
19ae2513
qscqesze Enhance linear decode tests by incorporating padding mask for accurat…
2f1bed06
qscqesze Refactor linear attention decoding kernel to improve handling of padd…
7bffe30c
qscqesze Add reference test for lightning attention consistency
e791c9fb
qscqesze Fix typo in reference implementation comment and streamline tensor ha…
2bd8fcb9
qscqesze Refactor lightning attention test for improved clarity and consistency
5483d26e
qscqesze Update lightning attention tests to relax tolerance levels and addres…
19b1264f
qscqesze Update tolerance levels in lightning attention tests for improved acc…
ea801551
qscqesze Refactor reference implementation of lightning attention for clarity …
33eecfa4
qscqesze Refactor lightning attention test to improve error handling and data …
c2abab42
qscqesze Update data type handling in lightning attention test for consistency
2c04f99f
qscqesze Update data type in lightning attention test to float32 for consistency
c134e79d
qscqesze Refactor lightning attention implementation for improved efficiency a…
2850c682
qscqesze Refactor lightning attention implementation for enhanced efficiency a…
2ac5d735
qscqesze Enhance numerical stability and efficiency in lightning attention imp…
11c9b85c
qscqesze Optimize lightning attention implementation for efficiency and clarity
0aaac31c
qscqesze Refactor lightning attention test for improved resource management an…
637ff5e7
qscqesze Enhance lightning attention implementation for improved numerical sta…
e4291f59
qscqesze Refine lightning attention implementation to match output shape and e…
84ef836f
qscqesze Update lightning attention test parameters for simplification
cdf7ae60
qscqesze Refactor lightning attention test for improved readability
05b6ac60
qscqesze
tlrmchlsmth
tlrmchlsmth
tlrmchlsmth commented on 2025-03-30
tlrmchlsmth
tlrmchlsmth commented on 2025-03-30
tlrmchlsmth tlrmchlsmth added ready
tlrmchlsmth
qscqesze
qscqesze Refactor lightning attention tests to simplify tensor initialization
e61ac582
qscqesze qscqesze requested a review from DarkLight1337 DarkLight1337 317 days ago
qscqesze Fix formatting in lightning attention test by removing unnecessary wh…
4d9b75da
qscqesze Refactor ConstantSizeCache and MiniMaxText01 for improved clarity and…
56a9f5d7
qscqesze
qscqesze
tlrmchlsmth
tlrmchlsmth commented on 2025-03-31
qscqesze Update tensor initialization in lightning attention tests to use rand…
f252f565
qscqesze Update lightning attention test to initialize KV cache with zeros and…
73fd4245
qscqesze Refactor tensor initialization in lightning attention tests to use sc…
1fb23369
qscqesze Refactor formatting in lightning attention tests for improved readabi…
e5cec6fa
qscqesze
tlrmchlsmth
qscqesze Merge branch 'vllm-project:main' into qinggangying/vllm
c7d93c1a
qscqesze
tlrmchlsmth
tlrmchlsmth approved these changes on 2025-04-01
tlrmchlsmth tlrmchlsmth merged 9ef98d52 into main 315 days ago

Login to write a write a comment.

Login via GitHub

Assignees
Labels
Milestone