pytorch
cb37709b - [te] Create TargetMachine only once with correct options to fix perf (#50406)

Commit
3 years ago
[te] Create TargetMachine only once with correct options to fix perf (#50406) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50406 We were creating different TMs in PytorchLLVMJIT and LLVMCodeGen; the one in LLVMCodeGen had the right target-specific options to generate fast AVX2 code (with FMAs, vbroadcastss, etc.), and that's what was showing up in the debug output, but the LLVMJIT TM was the one that actually generated runtime code, and it was slow. ghstack-source-id: 119700110 Test Plan: ``` buck run mode/opt //caffe2/benchmarks/fb/tensorexpr:tensorexpr_bench ``` With this diff NNC is getting at least somewhat (5%) close to Pytorch with MKL, for at least this one small-ish test case" ``` Run on (24 X 2394.67 MHz CPU s) 2021-01-11 15:57:27 ---------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ---------------------------------------------------------------------------------------------------- Gemm/Torch/128/128/128 65302 ns 65289 ns 10734 GFLOPS=64.2423G/s Gemm/TensorExprTile4x16VecUnroll/128/128/128 68602 ns 68599 ns 10256 GFLOPS=61.1421G/s ``` Reviewed By: bwasti Differential Revision: D25877605 fbshipit-source-id: cd293bac94d025511f348eab5c9b8b16bf6505ec
Author
Parents
Loading