iceberg
Core: Track duplicate DVs for data file and merge them before committing
#15006
Open

Core: Track duplicate DVs for data file and merge them before committing #15006

amogh-jahagirdar
github-actions github-actions added core
amogh-jahagirdar
github-actions github-actions added spark
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-09
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-09
singhpk234
singhpk234 commented on 2026-01-10
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-10
geruh
geruh commented on 2026-01-11
amogh-jahagirdar Core: Merge DVs referencing the same data files as a safeguard
82cced93
amogh-jahagirdar Fix dangling delete tests
e41943d2
amogh-jahagirdar Simplification in OutputFileFactory
76e24e40
amogh-jahagirdar minor optimization
a740ff91
amogh-jahagirdar cleanup, make outputfilefactory take in more fields so that we don't …
11ffc2f4
amogh-jahagirdar change the duplicate tracking algorithm, fix spark tests
772e3c20
amogh-jahagirdar Add more tests for multiple DVs and w equality deletes
3404a860
amogh-jahagirdar Rebase and fix spark 4.1 tests
c04d0e0e
amogh-jahagirdar amogh-jahagirdar force pushed from 374b5675 to c04d0e0e 4 days ago
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-11
amogh-jahagirdar more cleanup, put dvfilewriter in try w resources
a39b0737
geruh
geruh commented on 2026-01-11
amogh-jahagirdar Add logging, some more cleanup
a079d223
amogh-jahagirdar more cleanup
d7eadb00
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-12
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-12
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-12
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-12
amogh-jahagirdar amogh-jahagirdar marked this pull request as ready for review 3 days ago
amogh-jahagirdar amogh-jahagirdar changed the title Core: Merge DVs referencing the same data files as a safeguard Core: Track duplicate DVs for data file and merge them before committing 3 days ago
amogh-jahagirdar amogh-jahagirdar requested a review from rdblue rdblue 3 days ago
amogh-jahagirdar amogh-jahagirdar requested a review from nastra nastra 3 days ago
amogh-jahagirdar amogh-jahagirdar requested a review from aokolnychyi aokolnychyi 3 days ago
amogh-jahagirdar amogh-jahagirdar requested a review from RussellSpitzer RussellSpitzer 3 days ago
RussellSpitzer
RussellSpitzer commented on 2026-01-13
RussellSpitzer
RussellSpitzer commented on 2026-01-13
RussellSpitzer
RussellSpitzer commented on 2026-01-13
RussellSpitzer
RussellSpitzer commented on 2026-01-13
RussellSpitzer
RussellSpitzer commented on 2026-01-13
RussellSpitzer
RussellSpitzer commented on 2026-01-13
RussellSpitzer
rdblue
rdblue commented on 2026-01-13
rdblue
rdblue commented on 2026-01-13
rdblue
rdblue commented on 2026-01-13
rdblue
rdblue commented on 2026-01-13
RussellSpitzer
amogh-jahagirdar
amogh-jahagirdar Make dv refs a multimap, group by partition to write single puffin fo…
0a053a6c
amogh-jahagirdar amogh-jahagirdar force pushed from 0f9002c7 to 0a053a6c 14 hours ago
amogh-jahagirdar
amogh-jahagirdar commented on 2026-01-15
RussellSpitzer
RussellSpitzer commented on 2026-01-15
RussellSpitzer
RussellSpitzer commented on 2026-01-15
RussellSpitzer
RussellSpitzer commented on 2026-01-15
RussellSpitzer
RussellSpitzer commented on 2026-01-15
RussellSpitzer
RussellSpitzer commented on 2026-01-15
amogh-jahagirdar Filter files with duplicates before sifting through them and merging
6b04dd98
amogh-jahagirdar update old comment
a50fb321

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone