Remove vendored distributed/ (2D context-parallel) stack from esmfold2
The distributed/ package is a NVIDIA/MIT-licensed 2D context-parallel
implementation of the folding trunk (DTensor + DeviceMesh + NCCL) for
multi-GPU 6B inference. It is dropped from the port because:
- It is not imported by any model code (core, config, __init__, or the
experimental file) — fully inert in the package.
- It is broken on import: all 7 files import from
`projects.huggingface.transformers.models.esmfold2...` (the fork's
internal monorepo path), so `import transformers.models.esmfold2.distributed`
raises ModuleNotFoundError. It never worked in the standalone layout.
- It is NVIDIA/MIT-licensed, unlike the Apache/Biohub model code.
- Transformers expresses parallelism declaratively via `base_model_tp_plan`
/ `tp_plan="auto"`, not a vendored per-model DTensor/NCCL stack.
Nothing unique is lost: the math it shards already exists as the
pure-PyTorch reference in modeling_esmfold2_common.py. If multi-GPU
inference is needed later, author a tp_plan on ESMFold2Model fresh.
Verified: nothing references distributed/; `import transformers` and the
esmfold2 modeling module still import cleanly.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>