PR #29943 Add jamba - SemanticDiff

Add jamba arch

16b561d6

Merge branch 'main' into add-jamba

2e7fbe41

apply "make fix-copies" changes

5b84cbe2

fix link to model in JambaConfig docstring

b2f12fcb

Add n_ctx in modeling file because repo-consistency wants that

5f48e7b5

Add jamba to flash attention and sdpa documentation

f2bbe6dc

mamba dt_proj quant fix now works for LoRA as well

5ec508ef

Merge branch 'main' into add-jamba

35caa4f7

override test_left_padding_compatibility and use a more permissive to…

240c5778

ArthurZucker added New model

ArthurZucker commented on 2024-03-30

add jamba to tokenization auto

783a1ac1

Merge branch 'main' into add-jamba

b0c9d7cc

fix comments of shape (PR #24 in the model page: https://huggingface.…

56183b49

simple PR fixes

59d832a5

remove unnecessary kwargs from JambaAttentionDecoderLayer and JambaMa…

ce8b476f

remove the LoRA hack for the mamba dt_proj bias. It was solved in hug…

810dfbf0

Add copied comment on JambaMLP (it's the same as MixtralMLP)

b03a83d0

remove padding_mask warnings. It's not supported anymore

9bd48efb

fix docstring. Float instead of int

9c164dcf

A few more minor PR fixes

3a1ef30b

(1) lowercase names for mamba layernorms (2) remove _apply_inner_laye…

16b397fb

Return None attention weights from mamba layers. Append to all attent…

a272515f

remove some leftover jamba archive lists

16cff223

Merge branch 'main' into add-jamba

4c044b23

Better separation between expert vs non-expert layers. non-expert lay…

f833e258

no need to take router_logits at config.expert_layer_offset anymore. …

f368f8d0

Add Jamba paper on READMEs

a9342a2d

ArthurZucker commented on 2024-04-02

(1) rename n_ctx -> max_position_embeddings (2) don't use it in the m…

40432cfc

Add copied from comment

c0ef620a

remove the code path for apply_inner_layernorms=False. Jamba always h…

f980caa7

clearer docstring for _convert_to_standard_cache

f9573b99

style fixes

21c43bde

Change calc_logits_for_entire_prompt (bool) to num_logits_to_keep (in…

a425233f

Merge branch 'main' into add-jamba

20901768

rename test so it still overrides what its meant to override

12d9914b

draft

c53439cb

oups

e3801cb6

nit

8f7f1ad2

remove more complexe logic

c9c254a9

fix names used in config

574e68e3

fix fix fix

b3d37a19

style

5e9523c7

fix some more failing tests

0898ddc8

generate did not init the cache 🙃

65bfbeeb

more small nits

3764de07

typo

d7d64a7b

config.mamba_expand * config.hidden_size for the intermediate size o…

e1ada1dd

fix init of pkv with torch.tensor()

73603a29

empty tensor

a0f92cbe

fix some init issues

9cce32bf

stupid changes required by generate because it does not even support …

61ab3bc3

Merge branch 'main' of github.com:huggingface/transformers into updat…

6c01417d

more fixes

a8982c5a

Merge branch 'main' into add-jamba

ebbace33

fix general assisted gen cache_position bug

82be569e

tests passing

d7594d6f

Merge branch 'update-jamba' into add-jamba

bb5266aa

Add offsets and periods as SPECIAL_CASES_TO_ALLOW in check_config_att…

7e8ac81f

ArthurZucker approved these changes on 2024-04-09

fix reorder_cache to reorder mamba states and override some more func…

997be2cc

no need to override test_past_key_values_format() and _check_past_key…

1f475b27

fix docstrings and typehints for past_key_values

a252fe03

style fixes

c9f094a7

fix docs

5aace7c1

change typehint due to copy from Mixtral

1b3f2240

forgot import

1e87c88c

import order

ae7f7fbd

Merge branch 'main' into add-jamba

e71421cd

Add configuration_jamba and modeling_jamba to not_doctested because t…

5e0244d6

Add integration test with tiny tandom Jamba model on hub

5c031639

ArthurZucker commented on 2024-04-17

fix flash attention cache shapes

7b15866d

bring back forgotten hidden states

e9d227b9

tomeras91 marked this pull request as draft 2 years ago

rename HybridMambaAttentionDynamicCache.seqlen_offset to has_previous…

d1ae4fd7

align integration test after modeling fixes

a3e8094a

bugfix - mamba can use precomputed states only of forward pass is on …

a0a8d8cd

bugfix - mamba can use precomputed states only if they match the batc…

122c696a

typo

ab2a0d3b

tomeras91 marked this pull request as ready for review 2 years ago

Merge branch 'main' into add-jamba

62526032

ArthurZucker commented on 2024-04-17

remove making _prepare_4d_causal_attention_mask a leaf function

aabe99d0

stop using past_seq_len.get_seq_length(). Use cache positions instead…

886e8c81

ArthurZucker approved these changes on 2024-04-18

ArthurZucker merged 3f20877d into main 2 years ago

transformers
Add jamba
#29943

Merged

Add jamba #29943

transformers Add jamba #29943 Merged

Add jamba #29943

transformers
Add jamba
#29943

Merged