[fx-acc] Saturate host by replicating partitions onto idle devices (#60064)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60064
This implements a host saturation optimization to maximize the utilization of the available devices.
It uses a greedy heuristic to replicate all partitions on the used devices to another set of idle devices with enough memory.
The added unittest shows an example as follows:
```
partition_0: 192 bytes; partition_1: 48 bytes
dev_0: 200 bytes, [partition_0]
dev_1: 200 bytes, [partition_1]
dev_2: 100 bytes,
dev_3: 100 bytes,
dev_4: 200 bytes,
dev_5: 100 bytes
```
Before host saturation, `partition_0` is assigned to dev_0 and `partition_1` is assigned to dev_1.
After host saturation, `partition_0` is replicated to dev_4 simply because it's the only device that can hold all partitions on dev_0. `partition_1` is replicated to dev_2 because it has minimal but large enough memory to hold all partitions on dev_1.
Test Plan:
```
buck test mode/opt //caffe2/test:test_fx_experimental -- --exact 'caffe2/test:test_fx_experimental - test_saturate_host (test_fx_experimental.TestFXExperimental)'
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8444249343103429
✓ ListingSuccess: caffe2/test:test_fx_experimental - main (1.322)
✓ Pass: caffe2/test:test_fx_experimental - test_saturate_host (test_fx_experimental.TestFXExperimental) (1.322)
Summary
Pass: 1
ListingSuccess: 1
```
An e2e test will be added to `test_fx_glow.py` in a followup diff.
Reviewed By: gcatron
Differential Revision: D29039998
fbshipit-source-id: 57518aadf668f7f05abd6ff73224c16b5d2a12ac