Fix Evoformer compilation (#7760)
`EvoformerAttnBuilder` has some problems which preclude compiling the
extension on several scenarios (e.g., [isolated conda environment with
cuda toolchain](https://github.com/aqlaboratory/openfold-3/pull/34),
lack of hardware in the system) and breaks some standard DeepSpeed
configuration of target capabilities.
*Changes*
- Fix evoformer CUTLASS detection:
- Allow to skip it, useful when CUTLASS is already correctly setup
(e.g., in a conda environment with CUTLASS and the CUDA toolchain)
- Fix misleading use of deprecated nvidia-cutlass pypi package by
actually using the provided bindings but discouraging this route as
[these bindings are not maintained
anymore](https://github.com/NVIDIA/cutlass/discussions/2119)
- Fix evoformer compilation with no GPU is present:
- this is taken care correctly and more generally by
builder.compute_capability_args
- allow for cross-compilation in systems without GPU
- allows for compilation against all available virtual architectures and
binary outputs
- see e.g., https://github.com/deepspeedai/DeepSpeed/issues/5308
- Make all these changes configurable and explicit through documented
environment variables
Tested in all scenarios.
---------
Signed-off-by: Santi Villalba <sdvillal@gmail.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>