Improve CUDA EP's GetCapability (#17809)
Improve CUDA EP's GetCapability: Add layout transformer support.
Currently the code detects if a node is already assigned to some EP, if
yes, it will directly return.
```c++
if (!node.GetExecutionProviderType().empty()) {
return;
}
```
So, if you call the GetCapability function twice,
```c++
auto caps = GetCapability();
assign_nodes_to_eps(..., caps, ...);
auto caps2 = GetCapability();
```
The second GetCapability() call will return fewer results than the first
one. Layout transformer needs to call GetCapability twice as above. So
the current GetCapability() implementation is incompatible with the
Layout transformer. It is not an issue right now because the CUDA EP
doesn't need to do layout transform. But we might want to support a
different layout.