langchain
1250cd46 - fix: use model token limit not tokenizer ditto (#5939)

Commit

2 years ago

fix: use model token limit not tokenizer ditto (#5939) This fixes a token limit bug in the SentenceTransformersTokenTextSplitter. Before the token limit was taken from tokenizer used by the model. However, for some models the token limit of the tokenizer (from `AutoTokenizer.from_pretrained`) does not equal the token limit of the model. This was a false assumption. Therefore, the token limit of the text splitter is now taken from the sentence transformers model token limit. Twitter: @plasmajens #### Before submitting #### Who can review? @hwchase17 and/or @dev2049 --------- Co-authored-by: Harrison Chase <hw.chase.17@gmail.com>

Author

JensMadsen

Parents

f8cf09a2

langchain 1250cd46 - fix: use model token limit not tokenizer ditto (#5939)

langchain
1250cd46 - fix: use model token limit not tokenizer ditto (#5939)