transformers
3d276453 - Add LightOnOCR model implementation (#41621)

Commit

33 days ago

Add LightOnOCR model implementation (#41621) * Add LightOnOCR model implementation * fix modular docstring error * Improve LightOnOCR documentation and exports * Rename LightOnOCR multi-modal projector to vision projection and add tests * fix load without lmhead in safetensor * temp * Refactor LightOnOCR config to use sub_configs pattern * rename processor kwargs * Refactor LightOnOCR processor to use effective patch size Calculate effective_patch_size during initialization and use it throughout the processor. Update ProcessorKwargs defaults to include patch_size in images_kwargs. Remove redundant model_input_names property. * Improve LightOnOCR generation support with proper KV cache handling * add modeling tests and compile modular * Clean up LightOnOCR code and remove unused variables Remove unused image_features variable and model_input_names property * Add LightOnOCR documentation and test improvements Add model documentation page with config and class references. Update toctree to include LightOnOCR entry. Clean up test formatting and add vision/text models to private model exceptions. * Refactor LightOnOCR to use standardized RopeParameters and consolidate shared components * Rename LightOnOCR model classes and fix config parameter naming - Rename LightOnOCRText -> LightOnOCRTextModel and LightOnOCRVision -> LightOnOCRVisionModel - Fix parameter naming: image_token_index -> image_token_id - Set tie_word_embeddings default to False - Add special case for inherited Qwen3Config attributes in LightOnOCRTextConfig * Add missing parameter documentation for LightOnOCR config * Simplify LightOnOCR forward methods with decorators and fix loss function call * Reorganize LightOnOCR components to place vision before text and remove debug print * fixup * Fix image token expansion logic in Processor * Copy pixtral attention to have both pixtral and qwen eager attention forward * remove LightOnOCRTextPreTrainedModel from modular to be able to return attention * Support both tensor and list formats for image_sizes parameter * Update tests/models/lightonocr/test_processor_lightonocr.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Update docs/source/en/model_doc/lightonocr.md Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Move image_sizes tensor conversion from model to processor * Simplify weight initialization to use uniform text_config initializer_range * rename 1 letter vars * Get image special tokens from tokenizer attributes in processor * Return BaseModelOutputWithPast from LightOnOCRModel forward * Add chat template to LightOnOCR processor test setup * rm get_output_embeddings from LightOnOCRForConditionalGeneration (not needed) * Add OCR integration test for LightOnOCR model Tests model can perform OCR on real receipt image and extract expected text * Fix device/dtype handling in LightOnOCR vision processing * Add TransformersKwargs type hints to LightOnOCR forward methods * Make torch imports conditional and use _from_config for LightOnOCR sub-models * Set patch_size at runtime instead of modifying class defaults in LightOnOCR processor * type kwargs * Remove loocr forward comments * Add vocab_size property and fix image_token_id in LightOnOCR - Add vocab_size property to LightOnOCRConfig that delegates to text_config - Fix test parameter name from image_token_index to image_token_id - Add Unpack type hint to processor __call__ kwargs - Remove unnecessary comments from modeling forward method * Add vocab_size setter to LightOnOCR configuration * Fix device mismatch in vision rotary embeddings and optimize test image sizes * Improve LightOnOCR integration test with similarity-based output validation * Enable flex attention * Enable flex attention * Loocr description with blogpost * redundant tie_word_embeddings * remove architecture from default config * vocab_size accessors * remove useless tensor conversion * remove useless conversion * move dtype conversion to after image feature extraction * remove useless stuff * fixup * export text and vision config classes * refactor(lightonocr): remove unused weight initialization and fix tied weights mapping #0 - Remove custom _init_weights methods (handled by base class) - Update _tied_weights_keys to dict format with explicit mapping - Update documentation date * fix(lightonocr): fix test failures for vocab_size access and device placement #0 - Use config.text_config.vocab_size instead of config.vocab_size for composite config - Remove explicit device placement from attention_mask and image_sizes tensors - Allow device_map='auto' to handle device placement in model parallelism tests * ruff * rebase 8/12/2025 * rebase 09/12/2025 * review zucchini * review zucchini * rebase 10/12/2025 * refactor(lighton_ocr): fix naming conventions to use snake_case and proper CamelCase #0 - Rename model identifier from 'lightonocr' to 'lighton_ocr' (snake_case) - Update class names from 'LightOnOCR*' to 'LightOnOcr*' (proper CamelCase) - Update all auto mappings, tests, and documentation accordingly * style(lighton_ocr): remove unnecessary import guards for torch and vision #0 * style(lighton_ocr): remove unnecessary pass statement from LightOnOcrVisionConfig #0 * refactor(lighton_ocr): consolidate RMSNorm classes and use PixtralRMSNorm base #0 * refactor(lighton_ocr): import rotary pos emb functions from pixtral instead of redefining #0 - Remove duplicate vision_rotate_half and vision_apply_rotary_pos_emb functions - Import apply_rotary_pos_emb from pixtral modeling - Consolidate rotate_half/apply_rotary_pos_emb in generated modeling file * refactor(lighton_ocr): remove unused LightOnOcrVisionPreTrainedModel class #0 - Remove redundant VisionPreTrainedModel class that was not used - Add LightOnOcrVisionAttentionLayer to _no_split_modules in main PreTrainedModel * refactor(lighton_ocr): simplify LightOnOcrAttention and clarify docstring #0 - Remove redundant __init__ that only called super() - Update docstring to explain why class exists (avoids eager_attention_forward collision with Qwen3) * test(lighton_ocr): remove unnecessary skipped test methods #0 * refactor(lighton_ocr): remove use_sliding_window and max_window_layers from config #0 - Use del in __init__ to explicitly remove inherited attrs from Qwen3Config - Remove LightOnOCRTextConfig from check_config_attributes.py exception list - Fix rms_norm_eps type annotation from int to float * fix make fixup * docs(lighton_ocr): add docstring to LightOnOcrTextConfig and clean up check_repo #0 - Add configuration docstring with all parameters to LightOnOcrTextConfig - Consolidate duplicate comments in PRIVATE_MODELS - Remove redundant entries from IGNORE_NON_TESTED and IGNORE_NON_AUTO_CONFIGURED * chore(lighton_ocr): update copyright headers to LightOn Team #0 * refactor(lighton_ocr): clean up model files and add license headers #0 - Add Apache 2.0 license headers to generated files - Remove unused embedding getter/setter methods from ForConditionalGeneration - Clean up LightOnOcrTextConfig docstring and remove Qwen references * refactor(lighton_ocr): simplify processor token access and test setup #0 - Access special tokens directly from tokenizer attributes instead of getattr with defaults - Simplify test setup to use model_id and inherited ProcessorTesterMixin methods - Fix return types test to handle fast image processor limitations * refactor(lighton_ocr): unify attention functions and fix buffer registration #0 - Remove duplicate vision_eager_attention_forward, reuse eager_attention_forward from Qwen3 - Add num_key_value_groups attribute for GQA compatibility - Register original_inv_freq as buffer instead of plain attribute * refactor(lighton_ocr): remove vision_model property alias #0 * docs(lighton_ocr): add usage example and update release date #0 * rebase 12/01/26 * Update docs/source/en/model_doc/lighton_ocr.md Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * review cyril * review cyril * review cyril * Remove test.py from version control * apply modular * update years everywhere it was not updated * fix date * remove Attention forward implem * Fix all Vision prefixes instead of no prefix * move tying to main config * fix * add to all * immensely simplify * fix test * revert check_repo --------- Co-authored-by: Said Taghadouini <taghadouinisaid@gmail.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co> Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

References

#41621 - Add LightOnOCR model implementation

Author

baptiste-aubertin

Parents

77146cc9

transformers 3d276453 - Add LightOnOCR model implementation (#41621)

transformers
3d276453 - Add LightOnOCR model implementation (#41621)