Add jamba #29943

ArthurZucker merged 78 commits into huggingface:main from AI21Labs:add-jamba
tomeras91
tomeras91 Add jamba arch
16b561d6
tomeras91 Merge branch 'main' into add-jamba
2e7fbe41
tomeras91 apply "make fix-copies" changes
5b84cbe2
tomeras91 fix link to model in JambaConfig docstring
b2f12fcb
tomeras91 Add n_ctx in modeling file because repo-consistency wants that
5f48e7b5
tomeras91 Add jamba to flash attention and sdpa documentation
f2bbe6dc
tomeras91 mamba dt_proj quant fix now works for LoRA as well
5ec508ef
tomeras91 Merge branch 'main' into add-jamba
35caa4f7
tomeras91 override test_left_padding_compatibility and use a more permissive to…
240c5778
ArthurZucker ArthurZucker added New model
ArthurZucker
ArthurZucker
ArthurZucker commented on 2024-03-30
tomeras91 add jamba to tokenization auto
783a1ac1
tomeras91 Merge branch 'main' into add-jamba
b0c9d7cc
tomeras91 fix comments of shape (PR #24 in the model page: https://huggingface.…
56183b49
tomeras91 simple PR fixes
59d832a5
tomeras91 remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMa…
ce8b476f
tomeras91 remove the LoRA hack for the mamba dt_proj bias. It was solved in hug…
810dfbf0
tomeras91 Add copied comment on JambaMLP (it's the same as MixtralMLP)
b03a83d0
tomeras91 remove padding_mask warnings. It's not supported anymore
9bd48efb
tomeras91 fix docstring. Float instead of int
9c164dcf
tomeras91 A few more minor PR fixes
3a1ef30b
tomeras91 (1) lowercase names for mamba layernorms (2) remove _apply_inner_laye…
16b397fb
tomeras91 Return None attention weights from mamba layers. Append to all attent…
a272515f
tomeras91 remove some leftover jamba archive lists
16cff223
tomeras91 Merge branch 'main' into add-jamba
4c044b23
tomeras91 Better separation between expert vs non-expert layers. non-expert lay…
f833e258
tomeras91 no need to take router_logits at config.expert_layer_offset anymore. …
f368f8d0
tomeras91 Add Jamba paper on READMEs
a9342a2d
ArthurZucker
ArthurZucker commented on 2024-04-02
tomeras91 (1) rename n_ctx -> max_position_embeddings (2) don't use it in the m…
40432cfc
tomeras91 Add copied from comment
c0ef620a
tomeras91 remove the code path for apply_inner_layernorms=False. Jamba always h…
f980caa7
tomeras91 clearer docstring for _convert_to_standard_cache
f9573b99
tomeras91 style fixes
21c43bde
tomeras91 Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (in…
a425233f
tomeras91 Merge branch 'main' into add-jamba
20901768
tomeras91 rename test so it still overrides what its meant to override
12d9914b
ArthurZucker draft
c53439cb
ArthurZucker oups
e3801cb6
ArthurZucker nit
8f7f1ad2
ArthurZucker remove more complexe logic
c9c254a9
ArthurZucker fix names used in config
574e68e3
ArthurZucker fix fix fix
b3d37a19
ArthurZucker style
5e9523c7
ArthurZucker fix some more failing tests
0898ddc8
ArthurZucker generate did not init the cache 🙃
65bfbeeb
ArthurZucker more small nits
3764de07
ArthurZucker typo
d7d64a7b
ArthurZucker config.mamba_expand * config.hidden_size for the intermediate size o…
e1ada1dd
ArthurZucker fix init of pkv with torch.tensor()
73603a29
ArthurZucker empty tensor
a0f92cbe
ArthurZucker fix some init issues
9cce32bf
ArthurZucker stupid changes required by generate because it does not even support …
61ab3bc3
ArthurZucker Merge branch 'main' of github.com:huggingface/transformers into updat…
6c01417d
ArthurZucker more fixes
a8982c5a
tomeras91 Merge branch 'main' into add-jamba
ebbace33
gante fix general assisted gen cache_position bug
82be569e
gante tests passing
d7594d6f
tomeras91 Merge branch 'update-jamba' into add-jamba
bb5266aa
tomeras91 Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_att…
7e8ac81f
ArthurZucker
ArthurZucker approved these changes on 2024-04-09
ArthurZucker
tomeras91 fix reorder_cache to reorder mamba states and override some more func…
997be2cc
tomeras91 no need to override test_past_key_values_format() and _check_past_key…
1f475b27
tomeras91 fix docstrings and typehints for past_key_values
a252fe03
tomeras91 style fixes
c9f094a7
tomeras91 fix docs
5aace7c1
tomeras91 change typehint due to copy from Mixtral
1b3f2240
tomeras91 forgot import
1e87c88c
tomeras91 import order
ae7f7fbd
tomeras91 Merge branch 'main' into add-jamba
e71421cd
tomeras91 Add configuration_jamba and modeling_jamba to not_doctested because t…
5e0244d6
tomeras91 Add integration test with tiny tandom Jamba model on hub
5c031639
ArthurZucker
ArthurZucker commented on 2024-04-17
ArthurZucker
tomeras91 fix flash attention cache shapes
7b15866d
HuggingFaceDocBuilderDev
tomeras91 bring back forgotten hidden states
e9d227b9
tomeras91 tomeras91 marked this pull request as draft 2 years ago
tomeras91 rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous…
d1ae4fd7
tomeras91 align integration test after modeling fixes
a3e8094a
tomeras91 bugfix - mamba can use precomputed states only of forward pass is on …
a0a8d8cd
tomeras91 bugfix - mamba can use precomputed states only if they match the batc…
122c696a
tomeras91 typo
ab2a0d3b
tomeras91 tomeras91 marked this pull request as ready for review 2 years ago
tomeras91 Merge branch 'main' into add-jamba
62526032
ArthurZucker
ArthurZucker commented on 2024-04-17
tomeras91 remove making _prepare_4d_causal_attention_mask a leaf function
aabe99d0
tomeras91 stop using past_seq_len.get_seq_length(). Use cache positions instead…
886e8c81
ArthurZucker
ArthurZucker approved these changes on 2024-04-18
ArthurZucker ArthurZucker merged 3f20877d into main 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone