dataset-viewer
Store transformed values in duckdb index file
#2737
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
62
Changes
View On
GitHub
Store transformed values in duckdb index file
#2737
polinaeterna
merged 62 commits into
main
from
duckdb-index-transformed-columns
index string lengths
96ec6c5d
Merge branch 'main' into duckdb-index-transformed-columns
ff7e1e34
test
bb45169c
fix column name
6bfb620e
fix
a3638480
add image widths&lengths + refactor a bit
a02fe628
Merge branch 'main' into duckdb-index-transformed-columns
74f85253
Merge branch 'main' into duckdb-index-transformed-columns
16dd9068
compute fo lists + do not compute for strings-classes
08d992b2
rename
2bcb1cf4
update test with the check for columns
73d6d2ad
add '__hf_index_id' column only if there is search index (=at least o…
7310bbed
Merge branch 'main' into duckdb-index-transformed-columns
8faae37f
add check for actul values in result (for string type)
315b2b74
add list and sequence types to tests
4483b6f8
Merge branch 'main' into duckdb-index-transformed-columns
23037625
fix lists + keep None values in audio and image
6172091a
fix processing None values in audio and image
430ee87d
add audio and image data to tests
57b6a6a8
style
28cc9029
treat cases when all values in column are None
7b4da3f7
__hf_len -> __hf_length
190cda08
remove unused code
24651c54
refactor a bit
664fed95
move function to check pa list type to libcommon
d6bf9952
update features with transformed columns + take features directly fro…
9507dde0
get arrow schema from polars, not duckdb
c3035e59
Merge branch 'main' into duckdb-index-transformed-columns
2e562f3a
fix test
3192f927
hopefully fix disk space duplication issue
41acba78
move sql command to variable
507ccf59
Merge branch 'main' into duckdb-index-transformed-columns
deba6abc
fix keys in sql command fstring
1ae87810
style
c9fbd9be
polinaeterna
marked this pull request as ready for review
1 year ago
polinaeterna
changed the title
WIP: store transformed values in duckdb index file
Store transformed values in duckdb index file
1 year ago
polinaeterna
marked this pull request as draft
1 year ago
polinaeterna
marked this pull request as ready for review
1 year ago
fix
9c5f8f17
compute str lengths for class-strings too
9ba33a0d
polinaeterna
marked this pull request as draft
1 year ago
Merge branch 'main' into duckdb-index-transformed-columns
5e5345ef
o
99e55333
do not update features with transformed columns
9a905526
always create __hf_index_id column
9859e459
Merge branch 'main' into duckdb-index-transformed-columns
c3b0b5d2
polinaeterna
marked this pull request as ready for review
1 year ago
polinaeterna
requested a review
from
AndreaFrancis
1 year ago
polinaeterna
requested a review
from
severo
1 year ago
polinaeterna
requested a review
from
lhoestq
1 year ago
polinaeterna
requested a review
from
albertvillanova
1 year ago
polinaeterna
commented on 2024-05-03
remove commented lines
5dff86ad
AndreaFrancis
approved these changes on 2024-05-03
remove duplicated features update
16c9490c
Merge branch 'main' into duckdb-index-transformed-columns
37fc3e2d
query specific columns in search (instead of data.*)
18abcefc
get unsupported columns once
e607e771
rename columns __hf_length -> .length, __hf_duration -> .duration and…
e40cffa3
Merge branch 'main' into duckdb-index-transformed-columns
b7b197f9
polinaeterna
commented on 2024-05-06
polinaeterna
requested a review
from
AndreaFrancis
1 year ago
AndreaFrancis
approved these changes on 2024-05-09
lhoestq
commented on 2024-05-10
lhoestq
commented on 2024-05-10
severo
approved these changes on 2024-05-13
Merge branch 'main' into duckdb-index-transformed-columns
e07b3afb
refactor: move statistics utils to libcommon
018b72d3
add polars to libcommon dependencies
e0812b98
Merge branch 'main' into duckdb-index-transformed-columns
75cb9892
update poetry lock
a67f8e55
Merge branch 'main' into duckdb-index-transformed-columns
e45aecdb
refactor tests - move to libcommon too
3e77a92b
revert refactor
2a834d28
fix poetry lock
3eb576bf
refactor: move stats utils back to worker but outside of job runners
ebd8e619
fix worker test and style
d0cdd92d
polinaeterna
requested a review
from
severo
1 year ago
severo
approved these changes on 2024-05-24
Update services/worker/src/worker/job_runners/split/duckdb_index.py
a83c082e
rename length -> width
97cb88ab
Merge branch 'main' into duckdb-index-transformed-columns
5203d9b4
polinaeterna
merged
d1c56d3d
into main
1 year ago
polinaeterna
deleted the duckdb-index-transformed-columns branch
1 year ago
Login to write a write a comment.
Login via GitHub
Reviewers
severo
AndreaFrancis
lhoestq
albertvillanova
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub