unstructured
feat: clean pdfminer elements inside tables
#1808
Merged

Commits
  • feat: clean pdf miner elements inside tables
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • feat: generate_extra_info param added to partition_pdf
    Benjamin Torres committed 2 years ago
  • linting
    Benjamin Torres committed 2 years ago
  • Changelog update
    Benjamin Torres committed 2 years ago
  • refactor: changes location of clean_pdfminer_inner_elements
    Benjamin Torres committed 2 years ago
  • Add local connector metadata and fix deserialization
    rbiseck3 committed 2 years ago
  • update changelog
    rbiseck3 committed 2 years ago
  • move custom logic to from_dict rather than from_json
    rbiseck3 committed 2 years ago
  • Add test cases to unit test
    rbiseck3 committed 2 years ago
  • Refactor unit test to assert entire doc equality
    rbiseck3 committed 2 years ago
  • Don't make call to get metadata if it doesn't exist at the time a doc is serialized
    rbiseck3 committed 2 years ago
  • Add unit test to validate the lack of meta on serialized content if it doesn't exist
    rbiseck3 committed 2 years ago
  • Move custom serialization down a level to to_dict()
    rbiseck3 committed 2 years ago
  • Debug ingest update CI job
    rbiseck3 committed 2 years ago
  • Debug ingest update CI job
    rbiseck3 committed 2 years ago
  • local connector metadata and deserialization fix <- Ingest test fixtures update (#1818)
    rbiseck3 committed 2 years ago
  • bugfix/mapping source connectors in destination cli commands (#1788)
    rbiseck3 committed 2 years ago
  • update changelog
    rbiseck3 committed 2 years ago
  • Add new metadata to ignore in local ingest tests
    rbiseck3 committed 2 years ago
  • local connector metadata and deserialization fix <- Ingest test fixtures update (#1826)
    ryannikolaidis committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • Merge remote-tracking branch 'origin/roman/local-connector-metadata' into benjamin/feat/clean-pdfminer-inner-elements
    Benjamin Torres committed 2 years ago
  • feat: clean pdfminer elements inside tables <- Ingest test fixtures update (#1830)
    ryannikolaidis committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • refactor: changes way elements are removed from pages
    Benjamin Torres committed 2 years ago
  • test: add test for clean_pdfminer_inner_elements
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • linting
    Benjamin Torres committed 2 years ago
  • style: add typing to clean_pdfminer_inner_elements
    Benjamin Torres committed 2 years ago
  • fix: add generate_extra_info=False to several partition_pdf calls
    Benjamin Torres committed 2 years ago
  • test: refactor way of instantiate MockPageLayout
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • fix: add missing argument generate_extra_info
    Benjamin Torres committed 2 years ago
  • feat: clean pdfminer elements inside tables <- Ingest test fixtures update (#1852)
    ryannikolaidis committed 2 years ago
  • fix: minor issues in changelog
    Benjamin Torres committed 2 years ago
  • refactor: renaming variale for creating dictionary of inner elements
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • Linting
    Benjamin Torres committed 2 years ago
  • lint: delete unused imports
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • refactor: clean_pdfminer_inner_elements just removes elements
    Benjamin Torres committed 2 years ago
  • test: update output of test
    Benjamin Torres committed 2 years ago
  • refactor: removes unused argument
    Benjamin Torres committed 2 years ago
  • fix: add error margin to is_in operation
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • fix: Misspelled variable name
    Benjamin Torres committed 2 years ago
  • refactor: improvement on redability
    Benjamin Torres committed 2 years ago
  • Linting
    Benjamin Torres committed 2 years ago
  • refactor: improvement on redability
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • Changelog fixes and expected version update
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • fix: deletes incorrect detection_origin
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • chore: removing changelog entries
    Benjamin Torres committed 2 years ago
  • chore: recover changelog from main and add info
    Benjamin Torres committed 2 years ago
  • chore: remove duplicated entry in this branch
    Benjamin Torres committed 2 years ago
  • chore: remove duplicated entry in this branch
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • tests: added temporary fix to source checking
    Benjamin Torres committed 2 years ago
  • test: update origins for tests
    Benjamin Torres committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
  • test: update origins for tests
    Benjamin Torres committed 2 years ago
  • Linting
    Benjamin Torres committed 2 years ago
  • feat: clean pdfminer elements inside tables <- Ingest test fixtures update (#1935)
    ryannikolaidis committed 2 years ago
  • Merge branch 'main' into benjamin/feat/clean-pdfminer-inner-elements
    benjats07 committed 2 years ago
Loading