transformers
4ee0b755 - LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. (#14514)

Commit

4 years ago

LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. (#14514) * Added the lang argument to apply_tesseract in feature_extraction_layoutlmv2.py, which is used in pytesseract.image_to_data. * Added ocr_lang argument to LayoutLMv2FeatureExtractor.__init__, which is used when calling apply_tesseract * Updated the documentation of the LayoutLMv2FeatureExtractor * Specified in the documentation of the LayoutLMv2FeatureExtractor that the ocr_lang argument should be a language code. * Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Split comment into two lines to adhere to the max line size limit. * Update src/transformers/models/layoutlmv2/feature_extraction_layoutlmv2.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

References

#14514 - LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR.

Author

Xargonus

Parents

ebbe8cc3

transformers 4ee0b755 - LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. (#14514)

transformers
4ee0b755 - LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. (#14514)