optimum
31d4ea9c - Fix batched inference/generation, position_ids creation, falcon alibi, gpt_bigcode multi-query,.. (#2326)

Commit
188 days ago
Fix batched inference/generation, position_ids creation, falcon alibi, gpt_bigcode multi-query,.. (#2326) * test left-padded batched inference * demonstrate batched tex generation failure * fix remote code * fix * fix position_ids generation inside ORTModelForCausalLM class * it works until transformers 4.52 -_- * now run with latest transformers * bolean 4D mask is actually not supported by torch onnx exporter * only test generation with batched inputs, for logits are a bit off because of transformers using boolean mask * boolean mask safe softmax batched inference * style * use old typing * don't do unnecessary patching * try to avoid spamming the hub for an image * update min transformers version * better and direct torch patching * more batched generation special cases * style * initialize the il image instead of downloading it * use random pil image * test different versions of transformers in fast tests * fix * revert diffusers changes for now * mask padding kv cache as well * fix masking for old bloom * use constant image to image loading errors * style * test diffusers in series to avoid runner dying * fix * cleanup and some comments * fix and test falcon alibi * style * fix, support and test multi_query=False as well * only apply masked testing for transformers version previous to 4.39 * Update optimum/onnxruntime/modeling_decoder.py * use text decoder position ids onnx config but test its sync with list * fix opt * style * fix sdpa without overriting torch onnx exporter * use inplace op ;-; * fix st test * patch directly in onnx because patch needs to happen after softmax
Parents
Loading