unstructured
feat: clean pdfminer elements inside tables
#1808
Merged
Go
Login via GitHub
Home
Pricing
FAQ
Install
Login
via GitHub
Overview
Commits
68
Changes
View On
GitHub
Commits
feat: clean pdf miner elements inside tables
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
feat: generate_extra_info param added to partition_pdf
Benjamin Torres
committed
2 years ago
linting
Benjamin Torres
committed
2 years ago
Changelog update
Benjamin Torres
committed
2 years ago
refactor: changes location of clean_pdfminer_inner_elements
Benjamin Torres
committed
2 years ago
Add local connector metadata and fix deserialization
rbiseck3
committed
2 years ago
update changelog
rbiseck3
committed
2 years ago
move custom logic to from_dict rather than from_json
rbiseck3
committed
2 years ago
Add test cases to unit test
rbiseck3
committed
2 years ago
Refactor unit test to assert entire doc equality
rbiseck3
committed
2 years ago
Don't make call to get metadata if it doesn't exist at the time a doc is serialized
rbiseck3
committed
2 years ago
Add unit test to validate the lack of meta on serialized content if it doesn't exist
rbiseck3
committed
2 years ago
Move custom serialization down a level to to_dict()
rbiseck3
committed
2 years ago
Debug ingest update CI job
rbiseck3
committed
2 years ago
Debug ingest update CI job
rbiseck3
committed
2 years ago
local connector metadata and deserialization fix <- Ingest test fixtures update (#1818)
rbiseck3
committed
2 years ago
bugfix/mapping source connectors in destination cli commands (#1788)
rbiseck3
committed
2 years ago
update changelog
rbiseck3
committed
2 years ago
Add new metadata to ignore in local ingest tests
rbiseck3
committed
2 years ago
local connector metadata and deserialization fix <- Ingest test fixtures update (#1826)
ryannikolaidis
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
Merge remote-tracking branch 'origin/roman/local-connector-metadata' into benjamin/feat/clean-pdfminer-inner-elements
Benjamin Torres
committed
2 years ago
feat: clean pdfminer elements inside tables <- Ingest test fixtures update (#1830)
ryannikolaidis
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
refactor: changes way elements are removed from pages
Benjamin Torres
committed
2 years ago
test: add test for clean_pdfminer_inner_elements
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
linting
Benjamin Torres
committed
2 years ago
style: add typing to clean_pdfminer_inner_elements
Benjamin Torres
committed
2 years ago
fix: add generate_extra_info=False to several partition_pdf calls
Benjamin Torres
committed
2 years ago
test: refactor way of instantiate MockPageLayout
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
fix: add missing argument generate_extra_info
Benjamin Torres
committed
2 years ago
feat: clean pdfminer elements inside tables <- Ingest test fixtures update (#1852)
ryannikolaidis
committed
2 years ago
fix: minor issues in changelog
Benjamin Torres
committed
2 years ago
refactor: renaming variale for creating dictionary of inner elements
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
Linting
Benjamin Torres
committed
2 years ago
lint: delete unused imports
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
refactor: clean_pdfminer_inner_elements just removes elements
Benjamin Torres
committed
2 years ago
test: update output of test
Benjamin Torres
committed
2 years ago
refactor: removes unused argument
Benjamin Torres
committed
2 years ago
fix: add error margin to is_in operation
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
fix: Misspelled variable name
Benjamin Torres
committed
2 years ago
refactor: improvement on redability
Benjamin Torres
committed
2 years ago
Linting
Benjamin Torres
committed
2 years ago
refactor: improvement on redability
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
Changelog fixes and expected version update
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
fix: deletes incorrect detection_origin
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
chore: removing changelog entries
Benjamin Torres
committed
2 years ago
chore: recover changelog from main and add info
Benjamin Torres
committed
2 years ago
chore: remove duplicated entry in this branch
Benjamin Torres
committed
2 years ago
chore: remove duplicated entry in this branch
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
tests: added temporary fix to source checking
Benjamin Torres
committed
2 years ago
test: update origins for tests
Benjamin Torres
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
test: update origins for tests
Benjamin Torres
committed
2 years ago
Linting
Benjamin Torres
committed
2 years ago
feat: clean pdfminer elements inside tables <- Ingest test fixtures update (#1935)
ryannikolaidis
committed
2 years ago
Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
benjats07
committed
2 years ago
Loading