Add script for question answering (SQuAD) accuracy evaluation of BERT model (#12947)
Add script to evaluate accuracy of BERT/DistilBERT/Roberta models on question-answering task.
By default, pretrained model
`bert-large-uncased-whole-word-masking-finetuned-squad` will be used if
model name is not specified. If onnx path is not specified, optimum will
be used to export an ONNX model for testing.
Example usage:
* Evaluate with CPU execution provider:
`python eval_squad.py`
* Evaluate with CUDA execution provider:
`python eval_squad.py --use_gpu`
* Evaluate an optimized onnx model for
'distilbert-base-cased-distilled-squad' with sequence lengths
128/192/256/384 on first 100 samples:
`python eval_squad.py -m distilbert-base-cased-distilled-squad --use_gpu
-s 128 192 256 384 --onnx_path ./optimized_fp16.onnx -t 100`