[static runtime] Initial memonger (#47759)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47759
Parity reached :)
*/0 -> no memonger
*/1 -> memonger on
We can see that the impact is large when activations don't all fit in cache (6x speed up on this micro bench)
```
BM_long_static_memory_optimization/2/0 8563 ns 8559 ns 86370
BM_long_static_memory_optimization/8/0 8326 ns 8322 ns 84099
BM_long_static_memory_optimization/32/0 11446 ns 11440 ns 56107
BM_long_static_memory_optimization/512/0 6116629 ns 6113108 ns 128
BM_long_static_memory_optimization/2/1 8151 ns 8149 ns 87000
BM_long_static_memory_optimization/8/1 7905 ns 7902 ns 85124
BM_long_static_memory_optimization/32/1 10652 ns 10639 ns 66055
BM_long_static_memory_optimization/512/1 1101415 ns 1100673 ns 641
```
TODO:
[x] implementation
[x] enable/disable flag
[x] statistics about memory saved
[x] additional models
Test Plan:
```
buck test //caffe2/test:static_runtime
buck test //caffe2/benchmarks/static_runtime:static_runtime_cpptest
buck test //caffe2/caffe2/fb/predictor:pytorch_predictor_test
```
Reviewed By: yinghai
Differential Revision: D24824445
fbshipit-source-id: db1f5239f72cbd1a9444017e20d5a107c3b3f043