Improve correctness algorithm (#886)
Summary:
This PR improves the correctness algorithm by:
1. run 10 iterations and only pass the test when all of them pass
2. add a stableness test, if the model doesn't output stable results, do not run the correctness test
3. use the `torch.allclose()` to perform the correctness test, except for fx2trt+fp16, which keeps using cosine similarity
Pull Request resolved: https://github.com/pytorch/benchmark/pull/886
Reviewed By: frank-wei, jansel
Differential Revision: D36011154
Pulled By: xuzhao9
fbshipit-source-id: 1109a43e4276a547872ba62e57bdc5eddafb2ad5