unstructured
afbda958 - feat: custom fallback for language detection (#4238)

Commit
21 days ago
feat: custom fallback for language detection (#4238) Closes #4091 Implements custom fallback for language detection so short text is not forced to English and callers can control or disable detection. ## Changes: - language_fallback Optional callable used when text is short (<5 words) and ASCII. It receives the text and can return a list of ISO 639-3 codes or None to leave language unspecified. If not provided, short text still defaults to ["eng"] (backward compatible). - detect_languages() / apply_lang_metadata() New parameter language_fallback; applied in the short-text path only. - partition() (auto) New parameter language_fallback; passed through to all partitioners via the metadata decorator. - partition_md() New parameter languages so callers can pass languages=[""] to disable language detection (aligned with other partitioners). ## Usage: - Return None for short text: partition(..., language_fallback=lambda text: None) - Custom short-text language: partition(..., language_fallback=my_detector) - Disable detection: partition_md(..., languages=[""]) or partition(..., languages=[""])
Parents
Loading