unstructured
enhancement: clean pdf elements (bump unstructured-inference)
#790
Merged

enhancement: clean pdf elements (bump unstructured-inference) #790

cragwolfe merged 60 commits into main from enhancement/clean-pdf-elements
qued
cragwolfe cragwolfe changed the title enhancement: clean pdf elements enhancement: clean pdf elements (bump unstructured-inference) 2 years ago
cragwolfe
cragwolfe commented on 2023-06-25
cragwolfe
cragwolfe commented on 2023-06-25
cragwolfe
rbiseck3 rbiseck3 force pushed from 2a6282ef to bd57ccce 2 years ago
qued clean pdf elements
0f004df1
qued Formatting for regex pattern
669356f3
qued Add type guard
8c83c574
qued Bump versions
d4db5628
qued Update changelog
d312ed88
Unstructured-DevOps Update ingest test fixtures (#795)
dfb1baa5
qued Try again to fix ingest fixtures
05c2b166
Unstructured-DevOps Update ingest test fixtures (#798)
8fb1994e
qued Sort elements regardless of strategy.
0eca6694
qued revert change sorting outside of pages
b30dae43
Unstructured-DevOps Update ingest test fixtures (#807)
5e02bd29
rbiseck3 Set sorting key for elements without any coordinates
5aa2c762
rbiseck3 Make sure content of _partition_pdf_or_image_local also comes out sorted
579ce947
rbiseck3 rbiseck3 force pushed from 1128affb to 579ce947 2 years ago
rbiseck3 don't sort elements again here for hi_res pdfs
06eee790
ingest updates using python 3.8.17
772ea559
rbiseck3
rbiseck3 commented on 2023-06-28
Merge branch '3.8changes' into enhancement/clean-pdf-elements
1e5ef1cd
qued Merge branch 'main' into enhancement/clean-pdf-elements
79b6f7b5
qued Sort before adding page break and don't force page break inclusion.
bcf78b4f
qued Merge branch 'main' into enhancement/clean-pdf-elements
4e7b98f3
qued Address PR feedback
ab85d200
qued Fix re.sub statement
b6253b0f
cragwolfe to resolve later
e339abbd
cragwolfe fix changelog
0addda53
cragwolfe remove old ingest test scripts
e5dc778f
cragwolfe it is always the PageBreaks.
93aad0b1
cragwolfe stray newline
759a40b9
cragwolfe hintgd! type hint
e8fd8032
Unstructured-DevOps Update ingest test fixtures (#846)
84218acf
cragwolfe
cragwolfe commented on 2023-06-29
cragwolfe exclude Image elements since they are usually junk
ca4276af
cragwolfe lint
0780ec89
cragwolfe image tweak
d76a7512
cragwolfe reversed logic
26091469
cragwolfe tidy
ae963304
cragwolfe refactor logic again
2520a86f
cragwolfe tidy, type stuff
5c4a8dce
cragwolfe revert ingest-test script change
7909433a
cragwolfe none pagebreak is ok
45f2e169
Unstructured-DevOps Update ingest test fixtures (#851)
05676422
cragwolfe diffs
0cbef9e8
cragwolfe Merge branch 'main' into enhancement/clean-pdf-elements
29adfa8e
cragwolfe cragwolfe marked this pull request as ready for review 2 years ago
cragwolfe Merge branch 'main' into enhancement/clean-pdf-elements
cb240f3a
cragwolfe Merge branch 'main' into enhancement/clean-pdf-elements
ee03bfd0
cragwolfe version bump
c2e49cb6
cragwolfe Merge branch 'main' into enhancement/clean-pdf-elements
19ebffd8
cragwolfe Revert "diffs"
a8a8ac8f
rbiseck3
rbiseck3
rbiseck3 approved these changes on 2023-06-29
cragwolfe cragwolfe enabled auto-merge (squash) 2 years ago
cragwolfe comment out s3 for now, can resolve ordering in future bump
c9d0b24d
cragwolfe bump version
9e8e7bdc
cragwolfe Merge branch 'main' into enhancement/clean-pdf-elements
737e8a7b
cragwolfe remove fast s3 outputs for now
f5659710
cragwolfe Revert "remove fast s3 outputs for now"
d1d68c20
cragwolfe Revert "comment out s3 for now, can resolve ordering in future bump"
ad893c8f
cragwolfe bump to unstructured==0.5.4
319743a9
cragwolfe resolve conflict
4c7acffb
cragwolfe fix version
5dfa1eda
qued bump inference to 0.5.4 (#863)
8a22bf2c
disabled auto-merge 2 years ago
Manually disabled by user
qued Correct changelog
ba45c5d9
qued Update to release version
96e58d52
cragwolfe cragwolfe enabled auto-merge (squash) 2 years ago
disabled auto-merge 2 years ago
Manually disabled by user
qued Merge branch 'main' into enhancement/clean-pdf-elements
b7ac80d9
qued Remove accidentally added files
e670f95c
qued one more
8812bcfa
cragwolfe cragwolfe merged 350bb1da into main 2 years ago
cragwolfe cragwolfe deleted the enhancement/clean-pdf-elements branch 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone