[static runtime] Add Internal Ops to the registry (#48616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48616
This adds a couple of _out variants and then registers them to the registry.
I also added the concept of "canReuse{Input,Output}" so that we can annotate tensors that are not optimizable (specifically, non-float tensors).
In the future we can change this (with this D25062301)
after removing `RecordFunction`, we see these results
```
BS=20
---
caffe2: 0.651617 ~ 0.666354
static runtime: 0.753481
pytorch: 0.866658
BS=1
---
caffe2: 0.0858684 ~ 0.08633
static runtime: 0.209897
pytorch: 0.232694
```
Test Plan: standard internal test of ads model against caffe2 reference (see the scripts in this quip: https://fb.quip.com/ztERAYjuzdlr)
Reviewed By: hlu1
Differential Revision: D25066823
fbshipit-source-id: 25ca181c62209a4c4304f7fe73832b13e314df80