Add FAMBench dlrm model (#878)
Summary:
This PR adds FAMBench dlrm to TorchBench. Users can control the number of batches with the `DEFAULT_TRAIN_NUM_BATCHES` and `DEFAULT_EVAL_NUM_BATCHES` variables.
Result on A100:
```
$ python run.py fambench_dlrm -d cuda -t eval
Running eval method from fambench_dlrm on cuda in eager mode.
GPU Time: 18.289 milliseconds
CPU Dispatch Time: 18.271 milliseconds
CPU Total Wall Time: 18.314 milliseconds
$ python run.py fambench_dlrm -d cuda -t train
Running train method from fambench_dlrm on cuda in eager mode.
GPU Time: 17.304 milliseconds
CPU Dispatch Time: 17.286 milliseconds
CPU Total Wall Time: 17.330 millisecond
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/878
Reviewed By: erichan1
Differential Revision: D35816188
Pulled By: xuzhao9
fbshipit-source-id: c571c3680b68bc5f725ea76fa88f7688531f5b1b