[Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline (#45207)
* [Gemma4] Add docstrings for Per-Layer Embeddings (PLE) pipeline
The PLE system is complex and underdocumented, which makes it hard
for third-party implementations (llama.cpp, candle, mlx, etc.) to
get right. This adds:
- Config docstring for hidden_size_per_layer_input explaining that
the actual embedding dim is num_hidden_layers * hidden_size_per_layer_input,
the embedding is scaled by sqrt(hidden_size_per_layer_input), and
describing the full two-component pipeline
- Docstring for get_per_layer_inputs() explaining the token-identity
component and the packed-to-4D reshape
- Docstring for project_per_layer_inputs() explaining the context-aware
projection, normalization, and combination with scale factors
- Comment on the PLE init block pointing to the pipeline methods
Fixes huggingface#45206
* Address review: move PLE details to model doc, shorten config docstring
Move the detailed PLE pipeline description from the config docstring
to the Gemma4 model documentation page. The config docstring now just
describes the parameter shape and links to the full docs.
* Address review nits: move edits to modular_gemma4.py, simplify gemma4.md
- Remove bold formatting and config params section from gemma4.md per review
- Move docstrings and PLE comment from modeling_gemma4.py to modular_gemma4.py
- Revert modeling_gemma4.py (CI regenerates it from modular)
* fix: run make fix-repo to align modeling_gemma4.py with modular_gemma4.py