transformers
155db714 - [Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline (#45207)

Commit
21 days ago
[Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline (#45207) * [Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline The PLE system is complex and underdocumented, which makes it hard for third-party implementations (llama.cpp, candle, mlx, etc.) to get right. This adds: - Config docstring for hidden_size_per_layer_input explaining that the actual embedding dim is num_hidden_layers * hidden_size_per_layer_input, the embedding is scaled by sqrt(hidden_size_per_layer_input), and describing the full two-component pipeline - Docstring for get_per_layer_inputs() explaining the token-identity component and the packed-to-4D reshape - Docstring for project_per_layer_inputs() explaining the context-aware projection, normalization, and combination with scale factors - Comment on the PLE init block pointing to the pipeline methods Fixes huggingface#45206 * Address review: move PLE details to model doc, shorten config docstring Move the detailed PLE pipeline description from the config docstring to the Gemma4 model documentation page. The config docstring now just describes the parameter shape and links to the full docs. * Address review nits: move edits to modular_gemma4.py, simplify gemma4.md - Remove bold formatting and config params section from gemma4.md per review - Move docstrings and PLE comment from modeling_gemma4.py to modular_gemma4.py - Revert modeling_gemma4.py (CI regenerates it from modular) * fix: run make fix-repo to align modeling_gemma4.py with modular_gemma4.py
Author
Parents
Loading