Fix fastNLP GPU utilization issue (#499)
Summary:
Fixed the GPU utilization issue of the old fastNLP model.
- Use the official package of fastNLP
- Add a script that generates simulated CMRC2018 dataset with tuneable parameters
- Set num_worker=0 to avoid DataLoader process spawning overhead
- Run experiments to determine the best batch size for both train and inference (* marks the best batch size)
- JIT tests are disabled because compiler can't compile this model.
Eval batch size experiment:
```
+------------+--------------------------------+-------------------+
| Batch size | Latency | GPU Time Increase |
+------------+--------------------------------+-------------------+
| 1* | GPU Time: 149.808 milliseconds | - |
| 2 | GPU Time: 282.654 milliseconds | 89% |
| 4 | GPU Time: 553.860 milliseconds | 96% |
+------------+--------------------------------+-------------------+
```
Train batch size experiment:
```
+------------+---------------+---------+
| Batch Size | GPU Time (ms) | Speedup |
+------------+---------------+---------+
| 1* | 542.538 | - |
| 2 | 1007.098 | 85.63% |
| 4 | 1983.496 | 96.95% |
+------------+---------------+---------+
```
This should also fixes https://github.com/pytorch/benchmark/issues/316
Pull Request resolved: https://github.com/pytorch/benchmark/pull/499
Reviewed By: aaronenyeshi
Differential Revision: D31848500
Pulled By: xuzhao9
fbshipit-source-id: 2b8dde27308a05e5d943d8c36d23669687bdabf0