Add the actual TensorRT correctness testing code (#763)
Summary:
This PR adds the correctness testing code for TensorRT using cosine similarities.
Example command and output on A100:
```
$ python run.py resnet18 -d cuda -t eval --fx2trt
GPU Time: 0.613 milliseconds
CPU Dispatch Time: 2.319 milliseconds
CPU Total Wall Time: 2.647 milliseconds
Correctness: 0.999990403652191
$ python run.py resnet18 -d cuda -t eval --fx2trt --no-fp16
GPU Time: 0.929 milliseconds
CPU Dispatch Time: 2.295 milliseconds
CPU Total Wall Time: 2.926 milliseconds
Correctness: 0.999999642372131
$ python run.py alexnet -d cuda -t eval --fx2trt
GPU Time: 0.582 milliseconds
CPU Dispatch Time: 2.338 milliseconds
CPU Total Wall Time: 2.646 milliseconds
Corrnectness: 1.000000000000000
$ python run.py alexnet -d cuda -t eval --fx2trt --no-fp16
GPU Time: 0.885 milliseconds
CPU Dispatch Time: 2.352 milliseconds
CPU Total Wall Time: 2.937 milliseconds
Corrnectness: 1.000000000000000
$ python run.py mobilenet_v3_large -d cuda -t eval --fx2trt
GPU Time: 1.695 milliseconds
CPU Dispatch Time: 4.424 milliseconds
CPU Total Wall Time: 5.561 milliseconds
Correctness: 0.999975979328156
$ python run.py mobilenet_v3_large -d cuda -t eval --fx2trt --no-fp16
GPU Time: 3.241 milliseconds
CPU Dispatch Time: 3.069 milliseconds
CPU Total Wall Time: 5.590 milliseconds
Correctness: 0.999904215335846
```
Pull Request resolved: https://github.com/pytorch/benchmark/pull/763
Reviewed By: frank-wei
Differential Revision: D34438175
Pulled By: xuzhao9
fbshipit-source-id: c309009d9676628aa693e0037ee5068ee1a15c76