unstructured
Issue/unicode error - html, xml, and auto
#660
Merged

Issue/unicode error - html, xml, and auto #660

cragwolfe merged 33 commits into main from issue/encoding-error-html-xml-auto
christinestraub
christinestraub feat: apply read_txt_file pattern to filename and file-like object fo…
bc2494f1
christinestraub test: add test functions to test default encoding & example files
d20723f0
christinestraub chore: re-add example files
aaa42c24
christinestraub test: add test functions for encoding errors
1081f40c
christinestraub Merge branch 'main' into issue/encoding-error-html
a74cd726
christinestraub chore: update changelog and version
749025b4
christinestraub chore: change auto.py to have a `None` default for encoding
8c7da361
christinestraub feat: apply read_txt_file pattern to filename and file-like object fo…
7059c580
christinestraub test: add test functions & example files for xml
390fe6e1
christinestraub deploy: fix lint error
67d11770
christinestraub feat: remove unused parameter `encoding` for pdf
1cadf4be
christinestraub Merge branch 'main' into issue/encoding-error-html
883e5ac2
christinestraub chore: update changelog
c5891999
christinestraub Merge branch 'main' into issue/encoding-error-html-xml-auto
12467356
christinestraub test: fix partition_pdf test issue
06f3bfd6
christinestraub test: fix partition_xml test issue
9a4b26c0
christinestraub feat: add functionality to handle file-like object from url to the re…
667420c8
christinestraub chore: update changelog
67ec849c
christinestraub test: fix lint issues
bdf64893
christinestraub Merge branch 'main' into issue/encoding-error-html-xml-auto
e6cb6d55
christinestraub Merge branch 'main' into issue/encoding-error-html-xml-auto
1627ec05
christinestraub chore: update changelog and version
6ad4bcb6
christinestraub christinestraub marked this pull request as ready for review 2 years ago
christinestraub Merge branch 'main' into issue/encoding-error-html-xml-auto
b8d3ffd6
christinestraub feat: parse XML documents by applying the `read_txt_file` pattern to …
27b46cf2
christinestraub test: update test functions for partition_xml
2f82439c
christinestraub test: fix lint issues
fef995b3
christinestraub test: fix test_ingest issues
ae5d06f5
christinestraub Merge branch 'main' into issue/encoding-error-html-xml-auto
6470cea5
christinestraub christinestraub requested a review from cragwolfe cragwolfe 2 years ago
cragwolfe
cragwolfe commented on 2023-06-02
christinestraub test: fix local test ingest script
99687319
christinestraub feat: update `read_txt_file` utitlity function to keep using `spooled…
a731c546
christinestraub Merge branch 'main' into issue/encoding-error-html-xml-auto
212b1d6b
christinestraub chore: update changelog & version
e7165d0e
cragwolfe
cragwolfe approved these changes on 2023-06-05
christinestraub test: fix local test ingest script
722a4be1
cragwolfe cragwolfe merged 547bb38d into main 2 years ago
cragwolfe cragwolfe deleted the issue/encoding-error-html-xml-auto branch 2 years ago
christinestraub christinestraub changed the title Issue: fix encoding/decoding error with default utf-8 encoding for html, xml, and auto Issue/unicode error - html, xml, and auto 2 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone