vllm
[Model][MiniMaxText01] Support MiniMaxText01 model inference
#13454
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
108
Changes
View On
GitHub
[Model][MiniMaxText01] Support MiniMaxText01 model inference
#13454
tlrmchlsmth
merged 108 commits into
vllm-project:main
from
ZZBoom:qinggangying/vllm
ZZBoom
marked this pull request as draft
358 days ago
youkaichao
requested a review
from
tlrmchlsmth
358 days ago
tlrmchlsmth
commented on 2025-02-20
tlrmchlsmth
commented on 2025-02-20
tlrmchlsmth
commented on 2025-02-20
mergify
added
needs-rebase
tlrmchlsmth
assigned
tlrmchlsmth
350 days ago
mergify
added
documentation
mergify
added
ci/build
mergify
added
frontend
mergify
added
multi-modality
mergify
added
structured-output
mergify
added
speculative-decoding
mergify
added
v1
qscqesze
force pushed
to
1bd32bc8
335 days ago
mergify
removed
needs-rebase
qscqesze
force pushed
335 days ago
qscqesze
force pushed
335 days ago
qscqesze
force pushed
335 days ago
qscqesze
force pushed
335 days ago
qscqesze
force pushed
335 days ago
qscqesze
force pushed
335 days ago
[Config][HybridModel] Enhance layer determination logic for hybrid mo…
edddaf1c
[Refactor][MiniMaxText] Update cache mapping reference in MiniMaxText…
1719721b
[Refactor][AsyncLLM] Improve comments and clean up unused variables i…
d61b446e
[Refactor][MiniMaxText] Clean up imports and improve code formatting …
faa8c6c2
[Refactor][Config] Improve formatting and error handling in ModelConf…
7c65c038
[Refactor][Config] Enhance layer counting logic in ModelConfig and im…
43f0152b
[Refactor][Config] Improve formatting in VllmConfig for better readab…
d6e7798b
[Refactor][LightningAttn] Update grid configuration in _attention fun…
5504867d
[Refactor][LightningAttn] Simplify grid configuration in _attention f…
6c3f08b9
[Refactor][MiniMaxText] Enhance forward method in MiniMaxText01 model…
fad01e8c
[Refactor][MiniMaxText] Remove max_context_len parameter from MiniMax…
f6742797
[Refactor][MiniMaxText] Update forward method parameters in MiniMaxTe…
cb7c074f
[Refactor][MiniMaxText] Add context_lens_tensor and slot_mapping to A…
d48d3757
[Refactor][MiniMaxText] Remove unnecessary property methods from Atte…
989b4886
[Refactor][MiniMaxText] Simplify weight handling methods and improve …
0d4822d6
[Refactor][MiniMaxText] Clean up and optimize weight handling and par…
b682944a
[Refactor][MiniMaxText] Fix index handling in prefill loop of MiniMax…
9e9704af
[Refactor][MiniMaxText] Streamline handling of attn_metadata in forwa…
5de6b1be
[Refactor][MiniMaxText] Consolidate attn_metadata handling in MiniMax…
96c6dff5
[Refactor][MiniMaxText] Remove kv_caches parameter from multiple meth…
fc6ab05b
[Refactor][MiniMaxText] Enhance kv_cache handling in MiniMaxText01 mo…
a7f2e3a1
[Refactor][MiniMaxText] Remove unused kv_caches parameter from _clear…
bc17ba96
[Refactor][MiniMaxText] Initialize kv_cache in multiple classes of Mi…
2f873a95
[Refactor][MiniMaxText] Remove kv_cache initialization from MiniMaxTe…
152b430a
[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
2e59aa7b
[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
9ff34fc0
[Refactor][MiniMaxText] Remove redundant closing parenthesis in MiniM…
ed4ddcab
[Refactor][MiniMaxText] Set default number of hidden layers to 8 in M…
4bee45bb
[Refactor][MiniMaxText] Remove hardcoded number of hidden layers in M…
e4fd74e4
[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
495a39a4
[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
e0dec3ae
[Refactor][MiniMaxText] Initialize MinimaxCacheManager in MiniMaxText…
8f9891f1
[Refactor][MiniMaxText] Simplify kv_cache handling in MiniMaxText01 m…
1774c662
[Refactor][MiniMaxText] Reorder parameters in forward method of MiniM…
88ec7c6f
[Refactor][MiniMaxText] Add attn_metadata parameter to forward method…
2aa1c0de
[Refactor][MiniMaxText] Remove kv_cache initialization in MiniMaxText…
37e7fec4
[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …
f1c8fb63
[Refactor][MiniMaxText] Correctly define NUM_FBLOCK as a constexpr in…
fc361d81
[Refactor][LightningAttention] Improve code readability and consisten…
be625bf9
[Refactor][MiniMaxText] Simplify forward method in MiniMaxText01 mode…
95bdd4a3
[Refactor][MiniMaxText] Update kv_cache initialization in MiniMaxText…
5b619bbf
[Refactor][LightningAttention] Enhance code readability in lightning_…
f46e997d
[Refactor][LightningAttention] Optimize grid calculations in lightnin…
fce7caee
qscqesze
force pushed
to
fce7caee
334 days ago
ZZBoom
marked this pull request as ready for review
334 days ago
tlrmchlsmth
commented on 2025-03-17
[Refactor][MiniMaxText] Remove unused weight2param_match and weight2p…
f16f818f
[Refactor][MiniMaxText] Refactor layer initialization in MiniMaxText0…
09c9ceac
qscqesze
requested a review
from
WoosukKwon
330 days ago
[Update][SupportedModels] Add MiniMaxText01 model to the supported mo…
20d811a6
[Refactor][MiniMaxText] Clean up formatting and improve readability i…
aea72dc5
Merge remote-tracking branch 'origin/main' into qinggangying/vllm
925c02f4
[Model] Refactor layer block type handling in ModelConfig for improve…
65c8274d
Merge branch 'vllm-project:main' into qinggangying/vllm
01c5f9ea
Merge branch 'vllm-project:main' into qinggangying/vllm
5a02fdff
Refactor MiniMaxText01 model: import make_layers utility and initiali…
f0e54a76
Enhance error handling in model execution: return None for None hidde…
61b3820b
qscqesze
requested a review
from
robertgshaw2-redhat
327 days ago
qscqesze
requested a review
from
njhill
327 days ago
qscqesze
requested a review
from
ywang96
327 days ago
qscqesze
requested a review
from
comaniac
327 days ago
qscqesze
requested a review
from
alexm-redhat
327 days ago
Refactor MiniMaxText01 model: move None check for attn_metadata to af…
727b5728
tlrmchlsmth
commented on 2025-03-20
Refactor MiniMaxText01 model: replace direct access to attn_metadata.…
09d044bb
tlrmchlsmth
commented on 2025-03-20
tlrmchlsmth
commented on 2025-03-25
Merge branch 'vllm-project:main' into qinggangying/vllm
01c008a9
[Enhancement][Tests] Add comprehensive tests for lightning attention …
078a836d
[Refactor][GPU] Simplify dummy run and sampler execution in GPU model…
42dc9b8f
[Refactor][Tests] Clean up formatting and comments in lightning atten…
80052128
[Refactor][Attention] Enhance kernel functions and parameter handling…
d30be904
[Refactor][Attention] Improve clarity and structure in lightning atte…
c0581a3a
[Refactor][Tests] Update decay calculation in linear decode forward test
4036f881
[Refactor][Tests] Update decay handling in lightning attention tests
44d828b0
[Refactor][Tests] Update lightning attention tests to skip incompatib…
25353a6f
[Refactor][Tests] Enhance lightning attention tests to handle bfloat1…
61474928
[Refactor][Tests] Remove bfloat16 handling and clean up lightning att…
75fcabc3
[Refactor][Tests] Update decay tensor handling in lightning attention…
68d4549f
[Refactor][Tests] Remove deprecated lightning attention tests
1fdb4cc7
[Refactor][Tests] Expand data type support in lightning attention tests
358ba2de
Fix variable name in lightning attention layer to correct tensor load…
703af1dd
Refactor lightning attention integration in MiniMaxText01 model
e8d57248
Add assertion for dimension divisibility in lightning attention
0c6a9043
tlrmchlsmth
commented on 2025-03-27
Add reference implementation for linear attention decoding in tests
8663e13a
tlrmchlsmth
commented on 2025-03-28
Enhance lightning attention tests with reference implementation and a…
e61d6e32
Fix typos and enhance data type consistency in lightning attention im…
57471b8f
Enhance data type handling in linear decode function of lightning att…
ddabd28e
Refactor linear attention decoding kernel for improved clarity and pe…
7f329964
Refactor and enhance lightning attention tests for clarity and functi…
2ed7f2dc
Refactor linear attention decoding kernel to improve efficiency and c…
1107317c
Refactor linear attention decoding kernel and tests for improved clar…
19ae2513
Enhance linear decode tests by incorporating padding mask for accurat…
2f1bed06
Refactor linear attention decoding kernel to improve handling of padd…
7bffe30c
Add reference test for lightning attention consistency
e791c9fb
Fix typo in reference implementation comment and streamline tensor ha…
2bd8fcb9
Refactor lightning attention test for improved clarity and consistency
5483d26e
Update lightning attention tests to relax tolerance levels and addres…
19b1264f
Update tolerance levels in lightning attention tests for improved acc…
ea801551
Refactor reference implementation of lightning attention for clarity …
33eecfa4
Refactor lightning attention test to improve error handling and data …
c2abab42
Update data type handling in lightning attention test for consistency
2c04f99f
Update data type in lightning attention test to float32 for consistency
c134e79d
Refactor lightning attention implementation for improved efficiency a…
2850c682
Refactor lightning attention implementation for enhanced efficiency a…
2ac5d735
Enhance numerical stability and efficiency in lightning attention imp…
11c9b85c
Optimize lightning attention implementation for efficiency and clarity
0aaac31c
Refactor lightning attention test for improved resource management an…
637ff5e7
Enhance lightning attention implementation for improved numerical sta…
e4291f59
Refine lightning attention implementation to match output shape and e…
84ef836f
Update lightning attention test parameters for simplification
cdf7ae60
Refactor lightning attention test for improved readability
05b6ac60
tlrmchlsmth
commented on 2025-03-30
tlrmchlsmth
commented on 2025-03-30
tlrmchlsmth
added
ready
Refactor lightning attention tests to simplify tensor initialization
e61ac582
qscqesze
requested a review
from
DarkLight1337
317 days ago
Fix formatting in lightning attention test by removing unnecessary wh…
4d9b75da
Refactor ConstantSizeCache and MiniMaxText01 for improved clarity and…
56a9f5d7
tlrmchlsmth
commented on 2025-03-31
Update tensor initialization in lightning attention tests to use rand…
f252f565
Update lightning attention test to initialize KV cache with zeros and…
73fd4245
Refactor tensor initialization in lightning attention tests to use sc…
1fb23369
Refactor formatting in lightning attention tests for improved readabi…
e5cec6fa
Merge branch 'vllm-project:main' into qinggangying/vllm
c7d93c1a
tlrmchlsmth
approved these changes on 2025-04-01
tlrmchlsmth
merged
9ef98d52
into main
315 days ago
Login to write a write a comment.
Login via GitHub
Reviewers
tlrmchlsmth
qscqesze
WoosukKwon
robertgshaw2-redhat
njhill
ywang96
comaniac
alexm-redhat
DarkLight1337
Assignees
tlrmchlsmth
Labels
documentation
structured-output
frontend
speculative-decoding
ready
ci/build
v1
multi-modality
Milestone
No milestone
Login to write a write a comment.
Login via GitHub