PR #13454 [Model][MiniMaxText01] Support MiniMaxText01 model inference

ZZBoom marked this pull request as draft 358 days ago

youkaichao requested a review from

tlrmchlsmth 358 days ago

tlrmchlsmth commented on 2025-02-20

mergify added needs-rebase

tlrmchlsmth assigned

tlrmchlsmth 350 days ago

mergify added documentation

mergify added ci/build

mergify added frontend

mergify added multi-modality

mergify added structured-output

mergify added speculative-decoding

mergify added v1

qscqesze force pushed to 1bd32bc8 335 days ago

mergify removed needs-rebase

qscqesze force pushed 335 days ago

[Config][HybridModel] Enhance layer determination logic for hybrid mo…

edddaf1c

[Refactor][MiniMaxText] Update cache mapping reference in MiniMaxText…

1719721b

[Refactor][AsyncLLM] Improve comments and clean up unused variables i…

d61b446e

[Refactor][MiniMaxText] Clean up imports and improve code formatting …

faa8c6c2

[Refactor][Config] Improve formatting and error handling in ModelConf…

7c65c038

[Refactor][Config] Enhance layer counting logic in ModelConfig and im…

43f0152b

[Refactor][Config] Improve formatting in VllmConfig for better readab…

d6e7798b

[Refactor][LightningAttn] Update grid configuration in _attention fun…

5504867d

[Refactor][LightningAttn] Simplify grid configuration in _attention f…

6c3f08b9

[Refactor][MiniMaxText] Enhance forward method in MiniMaxText01 model…

fad01e8c

[Refactor][MiniMaxText] Remove max_context_len parameter from MiniMax…

f6742797

[Refactor][MiniMaxText] Update forward method parameters in MiniMaxTe…

cb7c074f

[Refactor][MiniMaxText] Add context_lens_tensor and slot_mapping to A…

d48d3757

[Refactor][MiniMaxText] Remove unnecessary property methods from Atte…

989b4886

[Refactor][MiniMaxText] Simplify weight handling methods and improve …

0d4822d6

[Refactor][MiniMaxText] Clean up and optimize weight handling and par…

b682944a

[Refactor][MiniMaxText] Fix index handling in prefill loop of MiniMax…

9e9704af

[Refactor][MiniMaxText] Streamline handling of attn_metadata in forwa…

5de6b1be

[Refactor][MiniMaxText] Consolidate attn_metadata handling in MiniMax…

96c6dff5

[Refactor][MiniMaxText] Remove kv_caches parameter from multiple meth…

fc6ab05b

[Refactor][MiniMaxText] Enhance kv_cache handling in MiniMaxText01 mo…

a7f2e3a1

[Refactor][MiniMaxText] Remove unused kv_caches parameter from _clear…

bc17ba96

[Refactor][MiniMaxText] Initialize kv_cache in multiple classes of Mi…

2f873a95

[Refactor][MiniMaxText] Remove kv_cache initialization from MiniMaxTe…

152b430a

[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …

2e59aa7b

[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …

9ff34fc0

[Refactor][MiniMaxText] Remove redundant closing parenthesis in MiniM…

ed4ddcab

[Refactor][MiniMaxText] Set default number of hidden layers to 8 in M…

4bee45bb

[Refactor][MiniMaxText] Remove hardcoded number of hidden layers in M…

e4fd74e4

[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …

495a39a4

[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …

e0dec3ae

[Refactor][MiniMaxText] Initialize MinimaxCacheManager in MiniMaxText…

8f9891f1

[Refactor][MiniMaxText] Simplify kv_cache handling in MiniMaxText01 m…

1774c662

[Refactor][MiniMaxText] Reorder parameters in forward method of MiniM…

88ec7c6f

[Refactor][MiniMaxText] Add attn_metadata parameter to forward method…

2aa1c0de

[Refactor][MiniMaxText] Remove kv_cache initialization in MiniMaxText…

37e7fec4

[Refactor][MiniMaxText] Update forward method in MiniMaxText01 model …

f1c8fb63

[Refactor][MiniMaxText] Correctly define NUM_FBLOCK as a constexpr in…

fc361d81

[Refactor][LightningAttention] Improve code readability and consisten…

be625bf9

[Refactor][MiniMaxText] Simplify forward method in MiniMaxText01 mode…

95bdd4a3

[Refactor][MiniMaxText] Update kv_cache initialization in MiniMaxText…

5b619bbf

[Refactor][LightningAttention] Enhance code readability in lightning_…

f46e997d

[Refactor][LightningAttention] Optimize grid calculations in lightnin…

fce7caee

qscqesze force pushed to fce7caee 334 days ago

ZZBoom marked this pull request as ready for review 334 days ago

tlrmchlsmth commented on 2025-03-17

[Refactor][MiniMaxText] Remove unused weight2param_match and weight2p…

f16f818f

[Refactor][MiniMaxText] Refactor layer initialization in MiniMaxText0…

09c9ceac

qscqesze requested a review from

WoosukKwon 330 days ago

[Update][SupportedModels] Add MiniMaxText01 model to the supported mo…

20d811a6

[Refactor][MiniMaxText] Clean up formatting and improve readability i…

aea72dc5

Merge remote-tracking branch 'origin/main' into qinggangying/vllm

925c02f4

[Model] Refactor layer block type handling in ModelConfig for improve…

65c8274d

Merge branch 'vllm-project:main' into qinggangying/vllm

01c5f9ea

Merge branch 'vllm-project:main' into qinggangying/vllm

5a02fdff

Refactor MiniMaxText01 model: import make_layers utility and initiali…

f0e54a76

Enhance error handling in model execution: return None for None hidde…

61b3820b

qscqesze requested a review from

robertgshaw2-redhat 327 days ago

qscqesze requested a review from

njhill 327 days ago

qscqesze requested a review from

ywang96 327 days ago

qscqesze requested a review from

comaniac 327 days ago

qscqesze requested a review from

alexm-redhat 327 days ago

Refactor MiniMaxText01 model: move None check for attn_metadata to af…

727b5728

tlrmchlsmth commented on 2025-03-20

Refactor MiniMaxText01 model: replace direct access to attn_metadata.…

09d044bb

tlrmchlsmth commented on 2025-03-20

tlrmchlsmth commented on 2025-03-25

Merge branch 'vllm-project:main' into qinggangying/vllm

01c008a9

[Enhancement][Tests] Add comprehensive tests for lightning attention …

078a836d

[Refactor][GPU] Simplify dummy run and sampler execution in GPU model…

42dc9b8f

[Refactor][Tests] Clean up formatting and comments in lightning atten…

80052128

[Refactor][Attention] Enhance kernel functions and parameter handling…

d30be904

[Refactor][Attention] Improve clarity and structure in lightning atte…

c0581a3a

[Refactor][Tests] Update decay calculation in linear decode forward test

4036f881

[Refactor][Tests] Update decay handling in lightning attention tests

44d828b0

[Refactor][Tests] Update lightning attention tests to skip incompatib…

25353a6f

[Refactor][Tests] Enhance lightning attention tests to handle bfloat1…

61474928

[Refactor][Tests] Remove bfloat16 handling and clean up lightning att…

75fcabc3

[Refactor][Tests] Update decay tensor handling in lightning attention…

68d4549f

[Refactor][Tests] Remove deprecated lightning attention tests

1fdb4cc7

[Refactor][Tests] Expand data type support in lightning attention tests

358ba2de

Fix variable name in lightning attention layer to correct tensor load…

703af1dd

Refactor lightning attention integration in MiniMaxText01 model

e8d57248

Add assertion for dimension divisibility in lightning attention

0c6a9043

tlrmchlsmth commented on 2025-03-27

Add reference implementation for linear attention decoding in tests

8663e13a

tlrmchlsmth commented on 2025-03-28

Enhance lightning attention tests with reference implementation and a…

e61d6e32

Fix typos and enhance data type consistency in lightning attention im…

57471b8f

Enhance data type handling in linear decode function of lightning att…

ddabd28e

Refactor linear attention decoding kernel for improved clarity and pe…

7f329964

Refactor and enhance lightning attention tests for clarity and functi…

2ed7f2dc

Refactor linear attention decoding kernel to improve efficiency and c…

1107317c

Refactor linear attention decoding kernel and tests for improved clar…

19ae2513

Enhance linear decode tests by incorporating padding mask for accurat…

2f1bed06

Refactor linear attention decoding kernel to improve handling of padd…

7bffe30c

Add reference test for lightning attention consistency

e791c9fb

Fix typo in reference implementation comment and streamline tensor ha…

2bd8fcb9

Refactor lightning attention test for improved clarity and consistency

5483d26e

Update lightning attention tests to relax tolerance levels and addres…

19b1264f

Update tolerance levels in lightning attention tests for improved acc…

ea801551

Refactor reference implementation of lightning attention for clarity …

33eecfa4

Refactor lightning attention test to improve error handling and data …

c2abab42

Update data type handling in lightning attention test for consistency

2c04f99f

Update data type in lightning attention test to float32 for consistency

c134e79d

Refactor lightning attention implementation for improved efficiency a…

2850c682

Refactor lightning attention implementation for enhanced efficiency a…

2ac5d735

Enhance numerical stability and efficiency in lightning attention imp…

11c9b85c

Optimize lightning attention implementation for efficiency and clarity

0aaac31c

Refactor lightning attention test for improved resource management an…

637ff5e7

Enhance lightning attention implementation for improved numerical sta…

e4291f59

Refine lightning attention implementation to match output shape and e…

84ef836f

Update lightning attention test parameters for simplification

cdf7ae60

Refactor lightning attention test for improved readability

05b6ac60

tlrmchlsmth commented on 2025-03-30

tlrmchlsmth added ready

Refactor lightning attention tests to simplify tensor initialization

e61ac582

qscqesze requested a review from

DarkLight1337 317 days ago

Fix formatting in lightning attention test by removing unnecessary wh…

4d9b75da

Refactor ConstantSizeCache and MiniMaxText01 for improved clarity and…

56a9f5d7

tlrmchlsmth commented on 2025-03-31

Update tensor initialization in lightning attention tests to use rand…

f252f565

Update lightning attention test to initialize KV cache with zeros and…

73fd4245

Refactor tensor initialization in lightning attention tests to use sc…

1fb23369

Refactor formatting in lightning attention tests for improved readabi…

e5cec6fa

Merge branch 'vllm-project:main' into qinggangying/vllm

c7d93c1a

tlrmchlsmth approved these changes on 2025-04-01

tlrmchlsmth merged 9ef98d52 into main 315 days ago

vllm
[Model][MiniMaxText01] Support MiniMaxText01 model inference
#13454

Merged

[Model][MiniMaxText01] Support MiniMaxText01 model inference #13454

vllm [Model][MiniMaxText01] Support MiniMaxText01 model inference #13454 Merged

[Model][MiniMaxText01] Support MiniMaxText01 model inference #13454

vllm
[Model][MiniMaxText01] Support MiniMaxText01 model inference
#13454

Merged