unstructured
enhancement: clean pdf elements (bump unstructured-inference)
#790
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
60
Changes
View On
GitHub
enhancement: clean pdf elements (bump unstructured-inference)
#790
cragwolfe
merged 60 commits into
main
from
enhancement/clean-pdf-elements
cragwolfe
changed the title
enhancement: clean pdf elements
enhancement: clean pdf elements (bump unstructured-inference)
2 years ago
cragwolfe
commented on 2023-06-25
cragwolfe
commented on 2023-06-25
rbiseck3
force pushed
from
2a6282ef
to
bd57ccce
2 years ago
clean pdf elements
0f004df1
Formatting for regex pattern
669356f3
Add type guard
8c83c574
Bump versions
d4db5628
Update changelog
d312ed88
Update ingest test fixtures (#795)
dfb1baa5
Try again to fix ingest fixtures
05c2b166
Update ingest test fixtures (#798)
8fb1994e
Sort elements regardless of strategy.
0eca6694
revert change sorting outside of pages
b30dae43
Update ingest test fixtures (#807)
5e02bd29
Set sorting key for elements without any coordinates
5aa2c762
Make sure content of _partition_pdf_or_image_local also comes out sorted
579ce947
rbiseck3
force pushed
from
1128affb
to
579ce947
2 years ago
don't sort elements again here for hi_res pdfs
06eee790
ingest updates using python 3.8.17
772ea559
rbiseck3
commented on 2023-06-28
Merge branch '3.8changes' into enhancement/clean-pdf-elements
1e5ef1cd
Merge branch 'main' into enhancement/clean-pdf-elements
79b6f7b5
Sort before adding page break and don't force page break inclusion.
bcf78b4f
Merge branch 'main' into enhancement/clean-pdf-elements
4e7b98f3
Address PR feedback
ab85d200
Fix re.sub statement
b6253b0f
to resolve later
e339abbd
fix changelog
0addda53
remove old ingest test scripts
e5dc778f
it is always the PageBreaks.
93aad0b1
stray newline
759a40b9
hintgd! type hint
e8fd8032
Update ingest test fixtures (#846)
84218acf
cragwolfe
commented on 2023-06-29
exclude Image elements since they are usually junk
ca4276af
lint
0780ec89
image tweak
d76a7512
reversed logic
26091469
tidy
ae963304
refactor logic again
2520a86f
tidy, type stuff
5c4a8dce
revert ingest-test script change
7909433a
none pagebreak is ok
45f2e169
Update ingest test fixtures (#851)
05676422
diffs
0cbef9e8
Merge branch 'main' into enhancement/clean-pdf-elements
29adfa8e
cragwolfe
marked this pull request as ready for review
2 years ago
Merge branch 'main' into enhancement/clean-pdf-elements
cb240f3a
Merge branch 'main' into enhancement/clean-pdf-elements
ee03bfd0
version bump
c2e49cb6
Merge branch 'main' into enhancement/clean-pdf-elements
19ebffd8
Revert "diffs"
a8a8ac8f
rbiseck3
approved these changes on 2023-06-29
cragwolfe
enabled auto-merge (squash)
2 years ago
comment out s3 for now, can resolve ordering in future bump
c9d0b24d
bump version
9e8e7bdc
Merge branch 'main' into enhancement/clean-pdf-elements
737e8a7b
remove fast s3 outputs for now
f5659710
Revert "remove fast s3 outputs for now"
d1d68c20
Revert "comment out s3 for now, can resolve ordering in future bump"
ad893c8f
bump to unstructured==0.5.4
319743a9
resolve conflict
4c7acffb
fix version
5dfa1eda
bump inference to 0.5.4 (#863)
8a22bf2c
disabled auto-merge
2 years ago
Manually disabled by user
Correct changelog
ba45c5d9
Update to release version
96e58d52
cragwolfe
enabled auto-merge (squash)
2 years ago
disabled auto-merge
2 years ago
Manually disabled by user
Merge branch 'main' into enhancement/clean-pdf-elements
b7ac80d9
Remove accidentally added files
e670f95c
one more
8812bcfa
cragwolfe
merged
350bb1da
into main
2 years ago
cragwolfe
deleted the enhancement/clean-pdf-elements branch
2 years ago
Login to write a write a comment.
Login via GitHub
Reviewers
rbiseck3
cragwolfe
Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Login to write a write a comment.
Login via GitHub