[HT] Clear the device placement tag for the auto gen sum so that we could break the component for FC sharing the same input (#42219)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42219
Introduce a new extra info that is tagged on the forward net for the operators sharing the same input. The effect is that the auto gen sum of gradient for the input will not follow the tag of the operator tags in the forward net. This allow more flexible device allocation.
Test Plan:
# unit test
`./buck-out/gen/caffe2/caffe2/python/core_gradients_test#binary.par -r testMultiUseInputAutoGenSumDevice`
Reviewed By: xianjiec, boryiingsu
Differential Revision: D22609080
fbshipit-source-id: d558145e5eb36295580a70e1ee3a822504dd439a