added use_cosine_similarity as an extra argument (#1110)
Summary:
The default allclose() correctness check doesn't work for bfloat16
kernels due to precision differences between fp32 and bfloat16. For trt
kernels this is addressed by switching to cosine_similarity. Instead
of adding flags for every such runtime, this patch adds "use_consine_similarity"
as an extra argument for benchmark execution to explicitly use this correctness
check.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1110
Reviewed By: H-Huang
Differential Revision: D38719289
Pulled By: xuzhao9
fbshipit-source-id: 58bb720d50dd962e50674fdaafd6d39bd9721f9b