support unbacked-batch-only in torchbench (#172719)
Summary:
support unbacked batch size in torchbench using the flag unbacked-batch-only this is same as dynamic-bactch-only but using unbacked dynamic shapes .
summary of results
### Inference: Unbacked vs Backed - Compile & Runtime Comparison.
Model | Runtime Speedup (Unbacked vs Backed)
-- | --
DistillGPT2 | 0.98x
GPT2ForSequenceClassification | 0.97x
AlbertForMaskedLM | 0.93x
AllenaiLongformerBase | 0.93x
OPTForCausalLM | 0.93x
BlenderbotForCausalLM | 0.89x
MegatronBertForCausalLM | 0.86x
MT5ForConditionalGeneration | 0.83x
M2M100ForConditionalGeneration | 0.81x
BartForCausalLM | 0.80x
GoogleFnet | 0.78x
ElectraForCausalLM | 0.77x
MBartForCausalLM | 0.77x
DebertaV2ForMaskedLM | 0.75x
BertForMaskedLM | 0.75x
XGLMForCausalLM | 0.75x
YituTechConvBert | 0.73x
T5ForConditionalGeneration | 0.73x
T5Small | 0.72x
PLBartForCausalLM | 0.71x
RobertaForCausalLM | 0.71x
LayoutLMForMaskedLM | 0.66x
XLNetLMHeadModel | 0.63x
TrOCRForCausalLM | 0.61x
PegasusForCausalLM | 0.60x
DistilBertForMaskedLM | 0.58x
MobileBertForMaskedLM | 0.53x
### Trianing: Unbacked vs Backed - Compile & Runtime Comparison.
Model | Unbacked is X Slower
-- | --
GoogleFnet | ❌ FAILED
M2M100ForConditionalGeneration | ❌ FAILED
TrOCRForCausalLM | ❌ FAILED
XGLMForCausalLM | ❌ FAILED
XLNetLMHeadModel | ❌ FAILED
MobileBertForMaskedLM | 29.5% slower
DistilBertForMaskedLM | 27.9% slower
ElectraForCausalLM | 23.2% slower
LayoutLMForMaskedLM | 22.7% slower
T5ForConditionalGeneration | 20.4% slower
T5Small | 20.2% slower
PegasusForCausalLM | 19.8% slower
RobertaForCausalLM | 15.3% slower
BertForMaskedLM | 12.8% slower
MT5ForConditionalGeneration | 10.8% slower
YituTechConvBert | 10.6% slower
DebertaV2ForMaskedLM | 9.8% slower
MBartForCausalLM | 8.9% slower
PLBartForCausalLM | 8.7% slower
BartForCausalLM | 8.6% slower
AllenaiLongformerBase | 5.5% slower
MegatronBertForCausalLM | 3.6% slower
AlbertForMaskedLM | 2.9% slower
DistillGPT2 | 1.3% slower
BlenderbotForCausalLM | 0.5% slower
OPTForCausalLM | 3.6% faster
Compile speedup probably due to less recompilations.
X-link: https://github.com/pytorch/pytorch/pull/172719
Approved by: https://github.com/aorenste
Reviewed By: izaitsevfb
Differential Revision: D92304453
fbshipit-source-id: 000a02e4ad7d2c2ca64097a44ca177932f719014