[PyTorch] Make TORCH_INTERNAL_ASSERT use torchCheckFail too (#52086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52086
I previously fixed TORCH_CHECK in D25481308 (https://github.com/pytorch/pytorch/commit/7d406b4a0751afdc2bd20d7be0920986178b41ae), but didn't cover TORCH_INTERNAL_ASSERT. No reason not to fix it too.
ghstack-source-id: 121456574
Test Plan:
Run framework overhead benchmarks.
Run build size check for igios.
Adindexer benchmark looks encouraging.
Before:
```
I0210 11:10:59.974778 2570617 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0548625. Iters per second: 18227.4
I0210 11:11:07.591706 2570617 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0677804. Iters per second: 14753.5
I0210 11:11:07.637014 2570617 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.35653. Iters per second: 157.319
I0210 11:11:14.592409 2572700 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0543933. Iters per second: 18384.6
I0210 11:11:22.158799 2572700 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0673752. Iters per second: 14842.3
I0210 11:11:22.204160 2572700 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.37655. Iters per second: 156.825
I0210 11:11:29.233793 2573079 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0541586. Iters per second: 18464.3
I0210 11:11:36.726284 2573079 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0666658. Iters per second: 15000.2
I0210 11:11:36.774489 2573079 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.36777. Iters per second: 157.041
I0210 11:11:43.799113 2573238 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0535797. Iters per second: 18663.8
I0210 11:11:51.433924 2573238 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0679261. Iters per second: 14721.9
I0210 11:11:51.479207 2573238 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.34747. Iters per second: 157.543
I0210 11:11:58.492782 2573599 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0548257. Iters per second: 18239.6
I0210 11:12:06.072979 2573599 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0674848. Iters per second: 14818.2
I0210 11:12:06.118813 2573599 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.34473. Iters per second: 157.611
```
After:
```
I0210 11:13:00.267062 2577288 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0531031. Iters per second: 18831.3
I0210 11:13:07.591711 2577288 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0651389. Iters per second: 15351.8
I0210 11:13:07.636951 2577288 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.25168. Iters per second: 159.957
I0210 11:13:14.497283 2580005 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0524907. Iters per second: 19051
I0210 11:13:21.814965 2580005 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0650504. Iters per second: 15372.7
I0210 11:13:21.861150 2580005 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.32074. Iters per second: 158.209
I0210 11:13:28.775005 2580166 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0528345. Iters per second: 18927
I0210 11:13:36.041087 2580166 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0646226. Iters per second: 15474.5
I0210 11:13:36.087904 2580166 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.38721. Iters per second: 156.563
I0210 11:13:43.223469 2580706 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0534523. Iters per second: 18708.3
I0210 11:13:50.603958 2580706 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.065639. Iters per second: 15234.8
I0210 11:13:50.649281 2580706 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.24524. Iters per second: 160.122
I0210 11:13:57.490873 2580904 BlackBoxPredictorBenchLib.cpp:384] C2 run finished. Milliseconds per iter: 0.0529411. Iters per second: 18888.9
I0210 11:14:04.745435 2580904 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 0.0644963. Iters per second: 15504.8
I0210 11:14:04.790006 2580904 PyTorchPredictorBenchLib.cpp:209] PyTorch run finished. Milliseconds per iter: 6.22258. Iters per second: 160.705
```
Looks like a pretty clear win (though it seems to have helped C2 as well). I checked with perf stat as well and it looks like a 1.9% CPU cycles win:
before:
```
35,313,858,645 cycles # 1.989 GHz ( +- 0.32% ) (99.98%)
17,750.69 msec task-clock # 0.999 CPUs utilized ( +- 0.33% )
70,524,321,763 instructions # 2.00 insn per cycle ( +- 0.52% ) (99.98%)
```
after:
```
34,628,390,377 cycles # 1.988 GHz ( +- 0.41% ) (99.98%)
17,416.59 msec task-clock # 0.999 CPUs utilized ( +- 0.41% )
70,800,211,396 instructions # 2.04 insn per cycle ( +- 0.11% ) (99.98%)
```
Reviewed By: ezyang
Differential Revision: D26372806
fbshipit-source-id: 817c7e61741334bb3ac33b617f9628309959b9c3