dataset-viewer
b988e218 - feat: Index the (text) datasets contents to enable full-text search - DuckDB (#1296)

Commit
2 years ago
feat: Index the (text) datasets contents to enable full-text search - DuckDB (#1296) * Draft files * Adding duckdb index job runner * Fix style * WIP adding fts on API * Remove non used code * Fix style * Adding chart objects * Rollback dependency in API * Depend on parquet an split * Fix libcommon test * Send index file to dedicated branch * Fix test in first parquet * Fix merge hanges * Fix poetry files * Adding happy path test * Adding other test scenarios * Adding chart configuration * Apply suggestions from code review Co-authored-by: Sylvain Lesage <sylvain.lesage@huggingface.co> * Change ParquetFileItem to SplitHubFile * Inherit from SplitCachedJobRunner * Fix style * Depends on info featues instead of parquet schema * Fix libcommon test * Apply code review suggestions * Some details * Fix style * Fix test * Apply code review suggestions * Update chart/values.yaml Co-authored-by: Sylvain Lesage <sylvain.lesage@huggingface.co> * Apply suggestions from code review Co-authored-by: Sylvain Lesage <sylvain.lesage@huggingface.co> * Apply code review suggestions * [docs] Improvements (#1376) * add end-to-end example * apply feedback * Fix closing brackets and GH action link (#1389) * Fix typo in erro rmessage (#1391) * Add docker internal to extra_hosts (#1390) * fix: 🐛 support bigger images (#1387) * fix: 🐛 support bigger images fixes https://github.com/huggingface/datasets-server/issues/1361 * style: 💄 fix style * style: 💄 add types for Pillow * Rename dev to staging, and use staging mongodb cluster (#1383) * chore: 🤖 remove makefile targets since we use ArgoCD now * feat: 🎸 align dev on prod, and use secret for mongo url * feat: 🎸 rename dev to staging * ci: 🎡 change dev to staging in ci * feat: 🎸 10x the size of supported images (#1392) * Fix exception * Fix test in libcommon * Apply some code review suggestions * Apply code review suggestions * Adding close connection * Upgrade duckdb version * Apply code review suggestions * Fix style * Adding some test cases * Remove duplicate code by merge * Fix imports * Apply code review suggestions * Apply suggestions from code review Co-authored-by: Sylvain Lesage <sylvain.lesage@huggingface.co> * Add test --------- Co-authored-by: Sylvain Lesage <sylvain.lesage@huggingface.co> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Bas Krahmer <baskrahmer@gmail.com> Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Author
Parents
Loading