Fix TRTModule not adding outputs in order (#64418)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64418
In T99368564, we found that when running TRT lowered module, the output tensors are out-of-order, as compared to the output from the original, non-lowered module. It turns out that in `TRTModule.forward()`, we cannot rely on `ICudaEngine` bindings natural order indices to create the output tensors, but rather, we should explicitly construct the output tensor from the bindings' names, in an ordered that we supply.
Test Plan:
* Arc lint
* Run CI/sandcastle tests
* Run GPU lowering using commands and code changes in D30171741 and ensure we don't observe out-of-order outputs
Reviewed By: yinghai
Differential Revision: D30693545
fbshipit-source-id: 32a894ceeb148fcf4e8d279be3835c7d1f1aa2ba