Reduce test time for TensorRT EP CI (#10408)
* expand model tests name
* skip cpu/cuda for trt when running onnxruntime_test_all
* only run trt ep for c++ unit test
* Update CMAKE_CUDA_ARCHITECTURES for T4
* Use new t4 agent pool
* Update YAML for run T4 on Windows
* revert code
* Update CMAKE_CUDA_ARCHITECTURES
* fix wrong value
* Remove cpu/cuda directly in model tests
* add only CMAKE_CUDA_ARCHITECTURES=75
* remove expanding model test name to see difference
* revert code
* Add fallback execution provider for unit test
* Add fallback execution provider for unit test (cont)
* add conditional to add fackback cuda ep
* Reduction op takes much longer time for TRT 8.2, so we test smaller range of inputs
* use M60
* revert code
* revert code
* add comments
* Modify code and add comment
* modify comment
* update comment
* add comment