langchain
95ee69a3 - langchain[patch]: In HTMLHeaderTextSplitter set default encoding to utf-8 (#16372)

Commit
2 years ago
langchain[patch]: In HTMLHeaderTextSplitter set default encoding to utf-8 (#16372) - **Description:** The HTMLHeaderTextSplitter Class now explicitly specifies utf-8 encoding in the part of the split_text_from_file method that calls the HTMLParser. - **Issue:** Prevent garbled characters due to differences in encoding of html files (except for English in particular, I noticed that problem with Japanese). - **Dependencies:** No dependencies, - **Twitter handle:** @i_w__a
Author
Parents
Loading