langchain
dc9b080b - Chroma: Catch and handle `NotEnoughElementsException` (#3368)

Commit
2 years ago
Chroma: Catch and handle `NotEnoughElementsException` (#3368) ## Problem and Solution This PR solves #1793, which is more of a convenience for users than anything else. When using Chroma as a vectorstore, if you try to run similarity search with a `k` value that is larger than the number of documents stored in the vectorstore, Chroma will raise a `chromadb.errors.NotEnoughElementsException`. The workaround is to add a new parameter in all similarity search methods under the `Chroma` class called `find_highest_possible_k`, an optional boolean parameter that defaults to True (changes default behavior). If this parameter is set to `False`, the methods will behave exactly as they did before this PR. If the parameter is `True`, however, the method will try running similarity search with the given `k`, and if `chromadb.errors.NotEnoughElementsException` is raised, iteratively lower `k` (down to `k=1`) until the error is no longer raised. The following is an example of how this is implemented in the `Chroma.similarity_search` method. https://github.com/preritdas/langchain/blob/e0846c2bcaafa4f54a193a6a7dfa8ed46480c326/langchain/vectorstores/chroma.py#L127-L159 We add the `find_highest_possible_k` parameter as `Optional` and defaulting to True. We explain it briefly in the docstring. We wrap the previous similarity search logic inside a private local function that takes `k`. If `find_highest_possible_k` is False, we return that private function, retaining previous behavior. If it is True, which it is by default, we iteratively lower `k` (until it is 1) until we can find `k` documents from the Chroma vectorstore. ## Example You create a `Chroma` object from 1 document. You then run `.similarity_search()`, `.similarity_search_by_vector()`, or `similarity_search_with_score()`. If you only pass a query, the default `k` is `4`. All methods would previously raise a `chromadb.errors.NotEnoughElementsException`. Now, however, all methods will return one document, the document inside the vectorstore (unless you're filtering, setting a maximum distance, etc.). ## Note I didn't find any places in the documentation to mention this change, other than the example Jupyter notebook for the Chroma vectorstore. In that notebook, there was never a cell running similarity search with parameters. If it's important to include information on altering the `find_highest_possible_k` parameter, I'll happily document it wherever.
Author
Parents
Loading