The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
langchain | ⬜️ Ignored (Inspect) | Visit Preview | Sep 3, 2023 8:34pm | |
langchain-deprecated | ⬜️ Ignored (Inspect) | Visit Preview | Sep 3, 2023 8:34pm |
@lorenzofavaro - we are planning to use https://python.langchain.com/docs/modules/data_connection/indexing to do this type of updating. would that satisfy your requirements?
Thanks for the contribution! See if the indexing code shared by @hwchase17 will work for your use case!
301 | 301 | texts=texts, embeddings=embeddings, metadatas=metadatas, ids=ids, **kwargs | |
302 | 302 | ) | |
303 | 303 | ||
304 | def update_documents( |
@lorenzofavaro take a look at the indexing code that @hwchase17 referenced. It should be able to solve this use case.
I'm OK adding an upsert_documents
functionality, but it would need to be added on the base class as well, and would require the user to provide ids
as part of the interface, and would need to implement upsert semantics.
The code should not assume that the content is identical just from the page_content, as there are use-cases when the relevant content lives in the metadata (e.g., metadata about two different products that share the same basic description). In this case, we'd want both documents to be indexed.
Thanks @hwchase17 I've seen the indexing and it solves my problem actually.
I must say that I wouldn't mind working on the solution proposed by @eyurtsev. I implemented the upsert semantics in the last commit.
Actually if the upsert_documents
is also included in the base class (VectorStore
) it must also be implemented in all other vector stores (apart from PGVector
).
If it's okay I can start adding this functionality for PGVector
, adding the abstract method upsert_documents
in the base class (in another commit) and in other PRs I can work on the others vector stores.
@eyurtsev Could you, please, review it?
Hey @lorenzofavaro ! Looks like this draft hasn't been worked on or marked as ready to review in a while. Is this something you'd still like to work on, or can I close it?
now a separate package
Login to write a write a comment.
Description
Enhancement to the PGVector functionality: the addition of an update function
update_documents(...)
.Currently, updating the documents of a collection requires emptying the collection and filling it again. This can cause more calls to be made to the model than are actually needed. In fact, if a text chunk (and therefore its embedding) is already present in the vector store in the current collection, what is currently done is to delete it and insert (therefore calling the embedding model) the same embedding again.
The new feature identifies differences between input and existing documents. It requests new embeddings only for different documents, inserting them into the DB, and deletes missing ones.
Issue
#9461 (Add Functionality to Update Embeddings in pgvector)
Using Sample