verify the number of outputs of xla graph (#89536)
This PR add tests to verify the behavior of number of outputs returns by an XLA graph. The understanding from this PR will help us fix https://github.com/pytorch/torchdynamo/issues/1908 and enable training for dynamo/torchxla integration eventually. Send this PR separately so Jack could help verify if the behavior is expected and play with it.
List some code snippets here since their behavior is not straightforward at a first glance:
```
def forward(self, a, b, c):
"""
The XLA graph will only return the first 2 items
"""
return a + b, a + c, b
```
```
def forward(self, a, b, c):
"""
Inplace update on b cause it to be returned in XLA graph
"""
b.zero_()
return a + b, a + c, b
```
```
def forward(self, a, b, c):
"""
Even if we return b twice, the XLA graph only return b once.
"""
b.zero_()
return a + b, a + c, b, b
```
Here are what observed by the added tests:
1. XLA does not return outputs that are also inputs -- if the tensor is not inplace updated. At first glance people may feel curious why should we consider this kind of 'non-realistic' corner case. But this kind of graphs indeed shows up in AOTAutograd. The main reason is AOTAutograd lift all model parameters/buffers as graph input and may return some of them. Check ***test_direct_return***
2. if a tensor is inplace updated, XLA will still return it as graph output even if it's also an input. The only difference compared to item 1 is, the inplace updating on the tensor cause it being returned. This happens for BatchNorm2d since the running_mean/variance tensors will be inplace updated during training. Check ***test_direct_return_with_inplace_update***
Pull Request resolved: https://github.com/pytorch/pytorch/pull/89536
Approved by: https://github.com/jansel