Add jamba arch
16b561d6
Merge branch 'main' into add-jamba
2e7fbe41
apply "make fix-copies" changes
5b84cbe2
fix link to model in JambaConfig docstring
b2f12fcb
Add n_ctx in modeling file because repo-consistency wants that
5f48e7b5
Add jamba to flash attention and sdpa documentation
f2bbe6dc
mamba dt_proj quant fix now works for LoRA as well
5ec508ef
Merge branch 'main' into add-jamba
35caa4f7
override test_left_padding_compatibility and use a more permissive to…
240c5778
add jamba to tokenization auto
783a1ac1
Merge branch 'main' into add-jamba
b0c9d7cc
fix comments of shape (PR #24 in the model page: https://huggingface.…
56183b49
simple PR fixes
59d832a5
remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMa…
ce8b476f
remove the LoRA hack for the mamba dt_proj bias. It was solved in hug…
810dfbf0
Add copied comment on JambaMLP (it's the same as MixtralMLP)
b03a83d0
remove padding_mask warnings. It's not supported anymore
9bd48efb
fix docstring. Float instead of int
9c164dcf
A few more minor PR fixes
3a1ef30b
(1) lowercase names for mamba layernorms (2) remove _apply_inner_laye…
16b397fb
Return None attention weights from mamba layers. Append to all attent…
a272515f
remove some leftover jamba archive lists
16cff223
Merge branch 'main' into add-jamba
4c044b23
Better separation between expert vs non-expert layers. non-expert lay…
f833e258
no need to take router_logits at config.expert_layer_offset anymore. …
f368f8d0
Add Jamba paper on READMEs
a9342a2d
(1) rename n_ctx -> max_position_embeddings (2) don't use it in the m…
40432cfc
Add copied from comment
c0ef620a
remove the code path for apply_inner_layernorms=False. Jamba always h…
f980caa7
clearer docstring for _convert_to_standard_cache
f9573b99
style fixes
21c43bde
Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (in…
a425233f
Merge branch 'main' into add-jamba
20901768
rename test so it still overrides what its meant to override
12d9914b
draft
c53439cb
oups
e3801cb6
nit
8f7f1ad2
remove more complexe logic
c9c254a9
fix names used in config
574e68e3
fix fix fix
b3d37a19
style
5e9523c7
fix some more failing tests
0898ddc8
generate did not init the cache 🙃
65bfbeeb
more small nits
3764de07
typo
d7d64a7b
config.mamba_expand * config.hidden_size for the intermediate size o…
e1ada1dd
fix init of pkv with torch.tensor()
73603a29
empty tensor
a0f92cbe
fix some init issues
9cce32bf
stupid changes required by generate because it does not even support …
61ab3bc3
Merge branch 'main' of github.com:huggingface/transformers into updat…
6c01417d
more fixes
a8982c5a
Merge branch 'main' into add-jamba
ebbace33
fix general assisted gen cache_position bug
82be569e
tests passing
d7594d6f
Merge branch 'update-jamba' into add-jamba
bb5266aa
Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_att…
7e8ac81f
fix reorder_cache to reorder mamba states and override some more func…
997be2cc
no need to override test_past_key_values_format() and _check_past_key…
1f475b27
fix docstrings and typehints for past_key_values
a252fe03
style fixes
c9f094a7
fix docs
5aace7c1
change typehint due to copy from Mixtral
1b3f2240
forgot import
1e87c88c
import order
ae7f7fbd
Merge branch 'main' into add-jamba
e71421cd
Add configuration_jamba and modeling_jamba to not_doctested because t…
5e0244d6
Add integration test with tiny tandom Jamba model on hub
5c031639
fix flash attention cache shapes
7b15866d
bring back forgotten hidden states
e9d227b9
tomeras91
marked this pull request as draft 2 years ago
rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous…
d1ae4fd7
align integration test after modeling fixes
a3e8094a
bugfix - mamba can use precomputed states only of forward pass is on …
a0a8d8cd
bugfix - mamba can use precomputed states only if they match the batc…
122c696a
typo
ab2a0d3b
tomeras91
marked this pull request as ready for review 2 years ago
Merge branch 'main' into add-jamba
62526032
remove making _prepare_4d_causal_attention_mask a leaf function
aabe99d0
stop using past_seq_len.get_seq_length(). Use cache positions instead…
886e8c81
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub