LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. (#14514)
* Added the lang argument to apply_tesseract in feature_extraction_layoutlmv2.py, which is used in pytesseract.image_to_data.
* Added ocr_lang argument to LayoutLMv2FeatureExtractor.__init__, which is used when calling apply_tesseract
* Updated the documentation of the LayoutLMv2FeatureExtractor
* Specified in the documentation of the LayoutLMv2FeatureExtractor that the ocr_lang argument should be a language code.
* Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Split comment into two lines to adhere to the max line size limit.
* Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>