unstructured
CORE-5030 gpt-4o ocr adam
#3098
Open

CORE-5030 gpt-4o ocr adam #3098

amaciaszek-dsai wants to merge 34 commits into feat/gpt4o-ocr from CORE-5030/gpt4o_ocr_adam_v2
amaciaszek-dsai
MthwRobinson improve: add Python 3.12 support (#3033) (#3047)
d7608014
Goldziher chore: add py.typed (#3043)
8802535e
tomheaton fix: Add `pip` as explicit dep in `environment.yml` to prevent warnin…
84cec1f4
weaviate-git-bot Updated Weaviate Docker image url (auto PR by bot) (#2659)
60f10fe6
ossner fix: update container link in README.md (#2889)
6066a264
MthwRobinson fix: set `skip_infer_tables` explicitly in `test_partition_via_api_wi…
acda4d07
MthwRobinson docs: redirect to docs.unstructured.io on github pages (#3054)
73739b38
rbiseck3 feat: refactor ingest (#3009)
3eaf65a8
MthwRobinson build: apk add libreoffice24 (#3065)
059fc64b
christinestraub feat: `partiton_pdf()` set inferred elements text (#3061)
b0d8a779
MthwRobinson feat: add attribution for pinecone (#3067)
7832dfc7
scanny rfctr(docx): organize docx tests (#3070)
30e5a0cd
christinestraub chore: bump unstructured-inference 0.7.33 (#3074)
18428f24
scanny rfctr: flatten test_unstructured/partition (#3073)
b4ee0191
MthwRobinson fix: revert back to old requirements file for sphinx docs (#3077)
c9976760
hubert-rutkowski85 feat/Move the category field to Element (#3056)
b8d894f9
MillCheck fix: added the missing function argument (#3085)
9b83330b
MthwRobinson fix: set `resolve_entities=False` in `partition_xml` (#3088)
171b5df0
scanny feat(docx): add pluggable picture sub-partitioner (#3081)
47d28612
potter-potter Fix: Chroma Upsert instead of Add (#3086)
31a53c8a
christinestraub fix: decide table extraction (#3090)
35ec21ec
scanny fix: add missing params to ElementMetadata (#3092)
26d403d7
badGarnet chore: reduce excessive logging (#3095)
809c7e51
amaciaszek-dsai amaciaszek-dsai changed the title Core 5030/gpt4o ocr adam v2 CORE-5030 gpt-4o ocr adam 1 year ago
badGarnet fix: disable table_as_cells output by default (#3093)
32df4ee1
ajjimeno Adding gpt4o as ocr for ocr_only mode
9c937d6a
amaciaszek-dsai logs & basic error handling for openai
2aa5f9d8
amaciaszek-dsai openai in requirements
ab3ac239
amaciaszek-dsai change assert
8af7b98c
amaciaszek-dsai modify prompt & log img size
00246db4
amaciaszek-dsai resize too large imgs
13b474d7
amaciaszek-dsai increase max_tokens
5ef0dfe5
amaciaszek-dsai max_tokens for gpt-4o is 4k
06fdf939
amaciaszek-dsai pdf_text_extractable always as False
cedbd0d7
amaciaszek-dsai base.txt update after rebase
be9ad3d7
amaciaszek-dsai amaciaszek-dsai force pushed from cc5f9597 to be9ad3d7 1 year ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone