Enable fx2trt on Huggingface models (#778)
Summary:
This PR supports part of the HuggingFace models to run with fx2trt.
It also enables `fp16` `half` support for hf models, but it is not default because `hf_BigBird` model doesn't support half for now.
Supported: hf_Bert, hf_Albert, hf_GPT2, hf_DistilBert
Not supported: hf_Bart, hf_BigBird, hf_Longformer, hf_Reformer, hf_T5
An example error log of unsupported models:
```
Traceback (most recent call last):
File "run.py", line 177, in <module>
m = Model(device=args.device, test=args.test, jit=(args.mode == "jit"), batch_size=args.bs, extra_args=extra_args)
File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/model.py", line 13, in __call__
obj.__post__init__()
File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/model.py", line 81, in __post__init__
apply_args(self, self.extra_args)
File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/extra_args.py", line 108, in apply_args
model.set_module(enable_fx2trt(args.batch_size, fp16=args.fp16, model=module, example_inputs=exmaple_inputs,
File "/fsx/users/xzhao9/benchmark/torchbenchmark/util/backends/fx2trt.py", line 63, in enable_fx2trt
traced_model = hf_symbolic_trace(
File "/data/home/xzhao9/cluster/miniconda3/envs/py38/lib/python3.8/site-packages/transformers/utils/fx.py", line 565, in symbolic_trace
raise NotImplementedError(
NotImplementedError: Model LongformerForMaskedLM is not supported yet, supported models: AlbertModel, AlbertForPreTraining, AlbertForMaskedLM, AlbertForMultipleChoice, AlbertForQuestionAnswering, AlbertForSequenceClassification, AlbertForTokenClassification, BertModel, BertForPreTraining, BertForNextSentencePrediction, BertForMaskedLM, BertLMHeadModel, BertForMultipleChoice, BertForQuestionAnswering, BertForSequenceClassification, BertForTokenClassification, DistilBertModel, DistilBertForMaskedLM, DistilBertForMaskedLM, DistilBertForMultipleChoice, DistilBertForQuestionAnswering, DistilBertForSequenceClassification, DistilBertForTokenClassification, MobileBertModel, MobileBertForPreTraining, MobileBertForNextSentencePrediction, MobileBertForMaskedLM, MobileBertForMultipleChoice, MobileBertForQuestionAnswering, MobileBertForSequenceClassification, MobileBertForTokenClassification, ElectraModel, ElectraForPreTraining, ElectraForMaskedLM, ElectraForMultipleChoice, ElectraForQuestionAnswering, ElectraForSequenceClassification, ElectraForTokenClassification, MegatronBertModel, MegatronBertForPreTraining, MegatronBertForNextSentencePrediction, MegatronBertForMaskedLM, MegatronBertForCausalLM, MegatronBertForMultipleChoice, MegatronBertForQuestionAnswering, MegatronBertForSequenceClassification, MegatronBertForTokenClassification, GPT2Model, GPT2LMHeadModel, GPT2LMHeadModel, GPT2ForSequenceClassification, GPT2ForTokenClassification, GPTJModel, GPTJForCausalLM, GPTJForSequenceClassification, GPTNeoModel, GPTNeoForCausalLM, GPTNeoForSequenceClassification, T5Model, T5ForConditionalGeneration, T5ForConditionalGeneration, GPT2DoubleHeadsModel
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/778
Reviewed By: frank-wei
Differential Revision: D34757194
Pulled By: xuzhao9
fbshipit-source-id: 017bb2f8050cb28c7e9de3ab77fd2107cbbe10e1