[BugFix] Fix EPLB balancedness metric using wrong dimension
The balancedness metric was computing mean/max along dim=0 (layers)
instead of dim=-1 (ranks). This measured cross-layer consistency
per rank rather than cross-rank balance per layer.
Concrete example with 2 layers, 2 ranks where rank 1 always gets 2x:
- Old metric: mean(dim=0)=[100,200], max(dim=0)=[100,200] → 1.0
- Actual per-layer balance: avg=150, max=200 → 0.75
The metric was reporting near-perfect balance even when ranks had
significant load disparity, as long as the disparity was consistent
across layers.
Signed-off-by: Travis Shears <travis@neuralmagic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>