make merge_fp32_into_fp16_inputs to generate ops for each partition (#36973)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/36973
handle the case where inputs are used in multiple partitions
Test Plan: unit tests
Reviewed By: yinghai
Differential Revision: D21107672
fbshipit-source-id: 9eca20220b80f27400aefcdaeff5d5503e32654c