[Pallas] Support multiple outputs (#6844)
Summary:
This pull request support Pallas kernels that output multiple results. The current implementation is to support an array of outputs and then do in-place updates to them. However, this somehow breaks dynamo. I will fix the dynamo issue later.
Test Plan:
PJRT_DEVICE=TPU python test/test_pallas.py -v -k test_multiple_returns