optimum
Fix batched inference/generation, position_ids creation, falcon alibi, gpt_bigcode multi-query,..
#2326
Merged

Fix batched inference/generation, position_ids creation, falcon alibi, gpt_bigcode multi-query,.. #2326

IlyasMoutawwakil merged 44 commits into main from fix-ort-batched-generation
IlyasMoutawwakil
IlyasMoutawwakil test left-padded batched inference
63a6efe6
IlyasMoutawwakil demonstrate batched tex generation failure
39496d8a
IlyasMoutawwakil fix remote code
2ccc1503
IlyasMoutawwakil fix
ecf65d55
IlyasMoutawwakil fix position_ids generation inside ORTModelForCausalLM class
9f3eedc1
HuggingFaceDocBuilderDev
IlyasMoutawwakil it works until transformers 4.52 -_-
b7bec5e4
IlyasMoutawwakil now run with latest transformers
0df42e5b
IlyasMoutawwakil bolean 4D mask is actually not supported by torch onnx exporter
999a145a
IlyasMoutawwakil only test generation with batched inputs, for logits are a bit off be…
638856e8
IlyasMoutawwakil IlyasMoutawwakil marked this pull request as ready for review 246 days ago
IlyasMoutawwakil IlyasMoutawwakil requested a review from echarlaix echarlaix 246 days ago
IlyasMoutawwakil boolean mask safe softmax batched inference
3d405020
IlyasMoutawwakil style
023d2ac9
IlyasMoutawwakil use old typing
accf8522
IlyasMoutawwakil don't do unnecessary patching
0965ea93
IlyasMoutawwakil try to avoid spamming the hub for an image
d1f9bbd2
IlyasMoutawwakil IlyasMoutawwakil added onnxruntime-slow
IlyasMoutawwakil update min transformers version
01c40843
IlyasMoutawwakil IlyasMoutawwakil removed review request from echarlaix echarlaix 243 days ago
IlyasMoutawwakil better and direct torch patching
aeeecb2e
IlyasMoutawwakil more batched generation special cases
fc62f420
IlyasMoutawwakil style
ba994fbe
IlyasMoutawwakil initialize the il image instead of downloading it
de6a798d
IlyasMoutawwakil use random pil image
cf164b31
IlyasMoutawwakil test different versions of transformers in fast tests
5934bf9f
IlyasMoutawwakil IlyasMoutawwakil removed onnxruntime-slow
IlyasMoutawwakil fix
4b76f5e7
IlyasMoutawwakil revert diffusers changes for now
e171196f
IlyasMoutawwakil mask padding kv cache as well
5ab88b6a
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-28
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-28
IlyasMoutawwakil fix masking for old bloom
4d35600b
IlyasMoutawwakil use constant image to image loading errors
b2a5f411
IlyasMoutawwakil style
3f58892a
IlyasMoutawwakil test diffusers in series to avoid runner dying
b9d2e03d
IlyasMoutawwakil fix
bdcc4252
IlyasMoutawwakil cleanup and some comments
a3dc4e82
IlyasMoutawwakil fix and test falcon alibi
a1ff2f2c
IlyasMoutawwakil style
603f62c2
IlyasMoutawwakil fix, support and test multi_query=False as well
cf5b562c
IlyasMoutawwakil only apply masked testing for transformers version previous to 4.39
3a29549b
IlyasMoutawwakil IlyasMoutawwakil requested a review from echarlaix echarlaix 243 days ago
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-28
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-28
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-28
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-28
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-28
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-28
IlyasMoutawwakil IlyasMoutawwakil added onnxruntime-slow
IlyasMoutawwakil Update optimum/onnxruntime/modeling_decoder.py
af5fa34a
IlyasMoutawwakil use text decoder position ids onnx config but test its sync with list
59c0c141
IlyasMoutawwakil Merge branch 'fix-ort-batched-generation' of https://github.com/huggi…
b5d92e51
IlyasMoutawwakil
IlyasMoutawwakil commented on 2025-07-29
IlyasMoutawwakil fix opt
9db07bf8
IlyasMoutawwakil style
98123d49
echarlaix
echarlaix approved these changes on 2025-07-29
IlyasMoutawwakil IlyasMoutawwakil changed the title Fix ORTModelForCausalLM batched generation Fix batched inference/generation, position_ids creation, falcon alibi, gpt_bigcode multi-query,.. 242 days ago
IlyasMoutawwakil fix sdpa without overriting torch onnx exporter
411df8f7
IlyasMoutawwakil use inplace op ;-;
133f3409
IlyasMoutawwakil Merge branch 'main' into fix-ort-batched-generation
90449484
IlyasMoutawwakil fix st test
c98ab28a
IlyasMoutawwakil patch directly in onnx because patch needs to happen after softmax
e787b921
IlyasMoutawwakil IlyasMoutawwakil merged 31d4ea9c into main 241 days ago
IlyasMoutawwakil IlyasMoutawwakil deleted the fix-ort-batched-generation branch 241 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone