Add TORCH_DCHECK macro that checks only in debug builds (#31240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31240
Follow up on discoveries/discussions in https://github.com/pytorch/pytorch/pull/30810
Mimic the `DCHECK` macro from https://github.com/pytorch/pytorch/blob/e5eb871/c10/util/logging_is_not_google_glog.h#L117-L125
With this change the perf gap is eliminated:
```
================================================================================
Program Output:
================================================================================
Run on (36 X 1601 MHz CPU s)
2019-12-12 20:12:13
-----------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------
BM_IntrusivePtrCtorDtor 23 ns 23 ns 30914703
BM_SharedPtrCtorDtor 27 ns 27 ns 25895944
BM_IntrusivePtrArray/16 503 ns 503 ns 1392139
BM_IntrusivePtrArray/32 1006 ns 1006 ns 695749
BM_IntrusivePtrArray/64 2013 ns 2013 ns 347714
BM_IntrusivePtrArray/128 4024 ns 4024 ns 173964
BM_IntrusivePtrArray/256 8047 ns 8047 ns 86994
BM_IntrusivePtrArray/512 16106 ns 16106 ns 43461
BM_IntrusivePtrArray/1024 32208 ns 32207 ns 21731
BM_IntrusivePtrArray/2048 64431 ns 64430 ns 10865
BM_IntrusivePtrArray/4096 128940 ns 128938 ns 5429
BM_SharedPtrArray/16 503 ns 503 ns 1392128
BM_SharedPtrArray/32 1006 ns 1006 ns 695940
BM_SharedPtrArray/64 2012 ns 2012 ns 347817
BM_SharedPtrArray/128 4024 ns 4023 ns 173927
BM_SharedPtrArray/256 8069 ns 8069 ns 86741
BM_SharedPtrArray/512 16143 ns 16142 ns 43357
BM_SharedPtrArray/1024 32283 ns 32283 ns 21685
BM_SharedPtrArray/2048 64718 ns 64717 ns 10817
BM_SharedPtrArray/4096 129469 ns 129466 ns 5407
================================================================================
```
```
================================================================================
Program Output:
================================================================================
Run on (80 X 2001 MHz CPU s)
2019-12-12 20:12:23
-----------------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------------
BM_IntrusivePtrCtorDtor 18 ns 18 ns 38630411
BM_SharedPtrCtorDtor 22 ns 22 ns 32356114
BM_IntrusivePtrArray/16 402 ns 402 ns 1739637
BM_IntrusivePtrArray/32 805 ns 805 ns 869818
BM_IntrusivePtrArray/64 1610 ns 1609 ns 434881
BM_IntrusivePtrArray/128 3218 ns 3218 ns 217437
BM_IntrusivePtrArray/256 6436 ns 6436 ns 108739
BM_IntrusivePtrArray/512 12882 ns 12882 ns 54356
BM_IntrusivePtrArray/1024 25763 ns 25763 ns 27177
BM_IntrusivePtrArray/2048 51532 ns 51531 ns 13590
BM_IntrusivePtrArray/4096 103091 ns 103091 ns 6778
BM_SharedPtrArray/16 402 ns 402 ns 1740165
BM_SharedPtrArray/32 804 ns 804 ns 869035
BM_SharedPtrArray/64 1610 ns 1610 ns 434975
BM_SharedPtrArray/128 3218 ns 3218 ns 217505
BM_SharedPtrArray/256 6457 ns 6457 ns 108510
BM_SharedPtrArray/512 12909 ns 12909 ns 54249
BM_SharedPtrArray/1024 25810 ns 25810 ns 27127
BM_SharedPtrArray/2048 51763 ns 51763 ns 13531
BM_SharedPtrArray/4096 103506 ns 103505 ns 6759
================================================================================
```
Test Plan:
buck test caffe2/c10/...
buck test mode/opt caffe2/c10/...
Differential Revision: D18998243
fbshipit-source-id: ddf0a118a80efe032b52d403867c1f416c721590