Gemma3 #36658

LysandreJik merged 126 commits into huggingface:main from RyanMullins:gemma3
RyanMullins
xenova Fix converter
5faeae3b
RyanMullins [Broken] Adds Gemma 3 to Hugging Face Transformers
b21634b1
RyanMullins Consolidating Config and Processor params across impls
8e27cd7c
RyanMullins Sorting out configuration parameters. Adds qk_norm before RoPE. Still…
72b60b03
RyanMullins Additional plumbing for CausalLM and ConditionalGeneration variants
e01056e6
RyanMullins incomplete draft of Orbax conversion script
9a084507
RyanMullins More complete checkpoint conversion
a7d7fb2c
RyanMullins Supporting Gemma 3 1B checkpoints
6d5b6372
RyanMullins Updating RoPE for multiple frequencies
2699ec62
RyanMullins Adjustments to rotary embedder
49c86587
RyanMullins Proof of life for text-only operation
146822a0
RyanMullins Updating the conversion script to handle multimodal projection weights
74f4acbb
RyanMullins Fixing tet-only conversions
bfcc3039
RyanMullins Cleaner conversion script with multimodal support and a simpler proce…
88897b29
RyanMullins Additional refatcors to the Gemma3Processor
0548c26a
RyanMullins Simplified Processor to work over text representations
1a860c71
RyanMullins Updated conversion script to join text and vision embeddings at conve…
f9036cd2
RyanMullins Logging for debugging
61f0b582
RyanMullins Update src/transformers/models/gemma2/modeling_gemma2.py
3f282c94
RyanMullins Removed extraneous Config params
8b41347b
RyanMullins Switching to fast tokenizer for checkpoint conversions
daacc1d3
RyanMullins isolating siglip for performance tetsing
4338957c
RyanMullins Minor changes for debugging tests against baselines
14c443cd
RyanMullins Adding average pooling for soft tokens
d45be318
RyanMullins Updating processor code to enable simpler embedding interleaving for …
cdbd03f3
RyanMullins Updating conversion script for ShieldGemma 2 conversion compatibility
ec2a7df8
pcuenca Allow disable_compile to be provided as a kwarg
85d11816
pcuenca Refresh from modular
6922438e
RyanMullins Updated conversion script and corrected sliding window
f47afe2a
pcuenca Fix type mismatch in cache_position (#4)
c40f6e21
pcuenca Fix dtype (#5)
5ebdcb82
RyanMullins fixes for embedding table overflow and missing image_soft_token_mask …
432c645e
MayankChaturvedi Adding 2D pooling for image embeddings
65350cf5
MayankChaturvedi Revert "Adding 2D pooling for image embeddings"
00af9a72
MayankChaturvedi Gemma3 average pooling changed from 1D to 2D
1a361871
RyanMullins Merge pull request #8 from RyanMullins/gemma3pooling
88030d16
RyanMullins Major refactor to Gemma3MultimodalInputProjection
e23b2ba6
RyanMullins Updating Gemm 3 Auto* registrations
6670e1b5
RyanMullins Add option to save Gemma 3 chat template with tokenizer during weight…
7907bf07
RyanMullins Removing unused imports
6d0dd5a6
RyanMullins Moving out-of-vocab handling from Gemma3Processor to Gemma3ForConditi…
c042cd08
RyanMullins Removing duplicate config property
10a61859
RyanMullins Removing final logit softcapping and 1-indexing of position ids
21fc6827
RyanMullins Fixing image processor config and none --> None typo
fa28a8ca
RyanMullins Fixing sliding window size for 1B
0f148d1c
RyanMullins Updating image_mean and image_std in Image Processor
48bca47e
MayankChaturvedi Attention masking changed to lower triangular
576f065c
RyanMullins Merge pull request #9 from RyanMullins/gemma3attention
f137065c
RyanMullins Moving image special tokens to conversion script
e9e41bb1
RyanMullins Mirror image processor defaults from conversion script into Gemma3Pro…
ed3813d3
RyanMullins Remove special token variables from symbol space
f25309c2
RyanMullins Moving image soft token mask computation from Gemma3Processor to Gemm…
2ad61ba7
RyanMullins tie lm_head and embedding weights
a45b01cb
RyanMullins Correct tied weights in Gemma3CausalLM
dae9277e
MayankChaturvedi iterative bidirectional attention
c5f84468
MayankChaturvedi resolving merge conflicts
c7c8468f
RyanMullins Reverting to Gemma 2 HybridCache with sldiing window support and a sl…
a7cb4af1
RyanMullins Correcting RoPE scaling
9bb66a27
zucchini-nlp clean up first pass, dummy model geenration works
bd5b5e5b
zucchini-nlp final clean up before fixing tests
e1d448c0
zucchini-nlp causal lm test works, so fine
4b9e8b45
pcuenca Fix conversion
ee837ca2
pcuenca Update src/transformers/models/gemma3/processing_gemma3.py
875c1047
pcuenca Merge remote-tracking branch 'origin/gemma3' into gemma3-convert
536d5b8c
zucchini-nlp model tests are happy
ae6f71db
zucchini-nlp processor tests are happy
de52bb59
zucchini-nlp image processing tests added
d0e0b00e
zucchini-nlp fixup
240c6958
pcuenca Fix pre-processing in conversion
42693328
pcuenca Inputs merging
b89faaf2
pcuenca Do not normalize vision embeddings
21f15c17
pcuenca Apply Ryan's (and team) changes to attention
abde03a8
zucchini-nlp token type ids + mask
613ccb3d
zucchini-nlp Merge branch 'gemma3-convert' into gemma3
0c5f50cd
zucchini-nlp template
f6f07d78
zucchini-nlp Merge remote-tracking branch 'upstream/main' into gemma3
50e17993
zucchini-nlp move embed scale, add rope scale, fix tests
0d914582
pcuenca Add chat template to tokenizer
f19907c5
pcuenca Use prefix for causal model loading
daf6feac
zucchini-nlp use existing code for sliding mask from gemma2
b03ef675
pcuenca Merge remote-tracking branch 'origin/gemma3' into multimodals-are-causal
402c7af6
pcuenca self.embed_tokens already normalizes
d36921d8
RyanMullins Correcting Gemma3TextConfig parameters in conversion script
b089958a
zucchini-nlp typo, modular overwrites my fixes
54ebbb74
pcuenca Merge branch 'gemma3' into multimodals-are-causal
50492bac
zucchini-nlp enable device map for text model
a99de0c5
pcuenca Conversion updates
f71762f6
pcuenca Merge pull request #7 from huggingface/multimodals-are-causal
e2c50bcb
zucchini-nlp ultra nit: no einsums
e9f46fd0
zucchini-nlp update image token
42b7a0a7
zucchini-nlp copy deepcopy config + some docs
d542591b
zucchini-nlp add some test, still WIP
faecbac7
RyanMullins Refactoring --include_chat_tempalte logic in converter
de4ae310
zucchini-nlp Update src/transformers/models/gemma3/modular_gemma3.py
03ea3327
pcuenca Add eos tokens for instruct models
6ed3b7dd
pcuenca Merge pull request #8 from huggingface/convert-with-eos
d9b65411
zucchini-nlp dump so i can work on dgx
a4078295
RyanMullins Removing add_bos by default
1436ae84
zucchini-nlp dump
69f3748e
zucchini-nlp add fast im proc
fbd8a270
zucchini-nlp docs for PaS + fixup
af8081bf
zucchini-nlp another fixup
21904843
zucchini-nlp one more fixup
49524d2f
zucchini-nlp fix tests
1c57c1ed
RyanMullins Inverting prior BOS change
8ab84bb4
zucchini-nlp ultra nit
6dd1aef1
RyanMullins Reverting to Tokenizer saved with add_bos_token=True and chat templat…
ae806859
zucchini-nlp resize embeds, remove sqrt, add slow test outputs
ba77bc56
zucchini-nlp FA2 but quality is meh
aa9d1410
zucchini-nlp Merge pull request #9 from huggingface/raushan-working
35ff071c
zucchini-nlp nit
ca82ebc7
zucchini-nlp skip FA2, no idea what happened
74da7218
zucchini-nlp last bit for green CI
123402a2
zucchini-nlp please, green CI for docs
d541fe4d
zucchini-nlp T_T
12807145
RyanMullins Fix for Gemma3 logits
49141331
xenova Support both options for system prompt
4c48f139
xenova Add support for both forms of system prompts
17119428
RyanMullins Update src/transformers/models/gemma3/image_processing_gemma3_fast.py
5ad5b27e
RyanMullins Update docs/source/en/model_doc/gemma3.md
c3b02132
RyanMullins Update docs/source/en/model_doc/gemma3.md
2dd948b0
RyanMullins Update docs/source/en/model_doc/gemma3.md
5f8f8a6c
RyanMullins Update docs/source/en/model_doc/gemma3.md
cd14f3fd
RyanMullins Update docs/source/en/model_doc/gemma3.md
782bb92f
RyanMullins Docs updates now that assets are live
a3341214
github-actions github-actions marked this pull request as draft 280 days ago
github-actions
ArthurZucker
ArthurZucker approved these changes on 2025-03-12
LysandreJik LysandreJik marked this pull request as ready for review 280 days ago
LysandreJik Style fixes
95435e91
LysandreJik LysandreJik merged 50d3530a into main 280 days ago
DarkLight1337
DarkLight1337 commented on 2025-03-12

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone