Add LightOnOCR model implementation (#41621)
* Add LightOnOCR model implementation
* fix modular docstring error
* Improve LightOnOCR documentation and exports
* Rename LightOnOCR multi-modal projector to vision projection and add tests
* fix load without lmhead in safetensor
* temp
* Refactor LightOnOCR config to use sub_configs pattern
* rename processor kwargs
* Refactor LightOnOCR processor to use effective patch size
Calculate effective_patch_size during initialization and use it throughout
the processor. Update ProcessorKwargs defaults to include patch_size in
images_kwargs. Remove redundant model_input_names property.
* Improve LightOnOCR generation support with proper KV cache handling
* add modeling tests and compile modular
* Clean up LightOnOCR code and remove unused variables
Remove unused image_features variable and model_input_names property
* Add LightOnOCR documentation and test improvements
Add model documentation page with config and class references. Update toctree to include LightOnOCR entry. Clean up test formatting and add vision/text models to private model exceptions.
* Refactor LightOnOCR to use standardized RopeParameters and consolidate shared components
* Rename LightOnOCR model classes and fix config parameter naming
- Rename LightOnOCRText -> LightOnOCRTextModel and LightOnOCRVision -> LightOnOCRVisionModel
- Fix parameter naming: image_token_index -> image_token_id
- Set tie_word_embeddings default to False
- Add special case for inherited Qwen3Config attributes in LightOnOCRTextConfig
* Add missing parameter documentation for LightOnOCR config
* Simplify LightOnOCR forward methods with decorators and fix loss function call
* Reorganize LightOnOCR components to place vision before text and remove debug print
* fixup
* Fix image token expansion logic in Processor
* Copy pixtral attention to have both pixtral and qwen eager attention forward
* remove LightOnOCRTextPreTrainedModel from modular to be able to return attention
* Support both tensor and list formats for image_sizes parameter
* Update tests/models/lightonocr/test_processor_lightonocr.py
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Update docs/source/en/model_doc/lightonocr.md
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* Move image_sizes tensor conversion from model to processor
* Simplify weight initialization to use uniform text_config initializer_range
* rename 1 letter vars
* Get image special tokens from tokenizer attributes in processor
* Return BaseModelOutputWithPast from LightOnOCRModel forward
* Add chat template to LightOnOCR processor test setup
* rm get_output_embeddings from LightOnOCRForConditionalGeneration (not needed)
* Add OCR integration test for LightOnOCR model
Tests model can perform OCR on real receipt image and extract expected text
* Fix device/dtype handling in LightOnOCR vision processing
* Add TransformersKwargs type hints to LightOnOCR forward methods
* Make torch imports conditional and use _from_config for LightOnOCR sub-models
* Set patch_size at runtime instead of modifying class defaults in LightOnOCR processor
* type kwargs
* Remove loocr forward comments
* Add vocab_size property and fix image_token_id in LightOnOCR
- Add vocab_size property to LightOnOCRConfig that delegates to text_config
- Fix test parameter name from image_token_index to image_token_id
- Add Unpack type hint to processor __call__ kwargs
- Remove unnecessary comments from modeling forward method
* Add vocab_size setter to LightOnOCR configuration
* Fix device mismatch in vision rotary embeddings and optimize test image sizes
* Improve LightOnOCR integration test with similarity-based output validation
* Enable flex attention
* Enable flex attention
* Loocr description with blogpost
* redundant tie_word_embeddings
* remove architecture from default config
* vocab_size accessors
* remove useless tensor conversion
* remove useless conversion
* move dtype conversion to after image feature extraction
* remove useless stuff
* fixup
* export text and vision config classes
* refactor(lightonocr): remove unused weight initialization and fix tied weights mapping #0
- Remove custom _init_weights methods (handled by base class)
- Update _tied_weights_keys to dict format with explicit mapping
- Update documentation date
* fix(lightonocr): fix test failures for vocab_size access and device placement #0
- Use config.text_config.vocab_size instead of config.vocab_size for composite config
- Remove explicit device placement from attention_mask and image_sizes tensors
- Allow device_map='auto' to handle device placement in model parallelism tests
* ruff
* rebase 8/12/2025
* rebase 09/12/2025
* review zucchini
* review zucchini
* rebase 10/12/2025
* refactor(lighton_ocr): fix naming conventions to use snake_case and proper CamelCase #0
- Rename model identifier from 'lightonocr' to 'lighton_ocr' (snake_case)
- Update class names from 'LightOnOCR*' to 'LightOnOcr*' (proper CamelCase)
- Update all auto mappings, tests, and documentation accordingly
* style(lighton_ocr): remove unnecessary import guards for torch and vision #0
* style(lighton_ocr): remove unnecessary pass statement from LightOnOcrVisionConfig #0
* refactor(lighton_ocr): consolidate RMSNorm classes and use PixtralRMSNorm base #0
* refactor(lighton_ocr): import rotary pos emb functions from pixtral instead of redefining #0
- Remove duplicate vision_rotate_half and vision_apply_rotary_pos_emb functions
- Import apply_rotary_pos_emb from pixtral modeling
- Consolidate rotate_half/apply_rotary_pos_emb in generated modeling file
* refactor(lighton_ocr): remove unused LightOnOcrVisionPreTrainedModel class #0
- Remove redundant VisionPreTrainedModel class that was not used
- Add LightOnOcrVisionAttentionLayer to _no_split_modules in main PreTrainedModel
* refactor(lighton_ocr): simplify LightOnOcrAttention and clarify docstring #0
- Remove redundant __init__ that only called super()
- Update docstring to explain why class exists (avoids eager_attention_forward collision with Qwen3)
* test(lighton_ocr): remove unnecessary skipped test methods #0
* refactor(lighton_ocr): remove use_sliding_window and max_window_layers from config #0
- Use del in __init__ to explicitly remove inherited attrs from Qwen3Config
- Remove LightOnOCRTextConfig from check_config_attributes.py exception list
- Fix rms_norm_eps type annotation from int to float
* fix make fixup
* docs(lighton_ocr): add docstring to LightOnOcrTextConfig and clean up check_repo #0
- Add configuration docstring with all parameters to LightOnOcrTextConfig
- Consolidate duplicate comments in PRIVATE_MODELS
- Remove redundant entries from IGNORE_NON_TESTED and IGNORE_NON_AUTO_CONFIGURED
* chore(lighton_ocr): update copyright headers to LightOn Team #0
* refactor(lighton_ocr): clean up model files and add license headers #0
- Add Apache 2.0 license headers to generated files
- Remove unused embedding getter/setter methods from ForConditionalGeneration
- Clean up LightOnOcrTextConfig docstring and remove Qwen references
* refactor(lighton_ocr): simplify processor token access and test setup #0
- Access special tokens directly from tokenizer attributes instead of getattr with defaults
- Simplify test setup to use model_id and inherited ProcessorTesterMixin methods
- Fix return types test to handle fast image processor limitations
* refactor(lighton_ocr): unify attention functions and fix buffer registration #0
- Remove duplicate vision_eager_attention_forward, reuse eager_attention_forward from Qwen3
- Add num_key_value_groups attribute for GQA compatibility
- Register original_inv_freq as buffer instead of plain attribute
* refactor(lighton_ocr): remove vision_model property alias #0
* docs(lighton_ocr): add usage example and update release date #0
* rebase 12/01/26
* Update docs/source/en/model_doc/lighton_ocr.md
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* review cyril
* review cyril
* review cyril
* Remove test.py from version control
* apply modular
* update years everywhere it was not updated
* fix date
* remove Attention forward implem
* Fix all Vision prefixes instead of no prefix
* move tying to main config
* fix
* add to all
* immensely simplify
* fix test
* revert check_repo
---------
Co-authored-by: Said Taghadouini <taghadouinisaid@gmail.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>