Fix various test errors in the single GPU case (#3031)
This addresses some of the errors reported by running the tests
on a single GPU machine.
I will list the error messages and a short explanation of the fix.
> `FAILED tests/test_common_gpu.py::PeftGPUCommonTests::test_lora_gptq_quantization_from_pretrained_safetensors - NameError: name 'BACKEND' is not defined`
The test was using GPTQModel without marking the test as requiring it leading to an error. This is fixed
by marking the test with `requires_gptqmodel`.
> `FAILED tests/test_custom_models.py::TestPeftCustomModel::test_only_params_are_updated[Embedding + transformers Conv1D 1 trainable_tokens-EmbConv1D-TrainableTokensConfig-config_kwargs180] - AssertionError: assert not True`
> `FAILED tests/test_custom_models.py::TestPeftCustomModel::test_disable_adapters_with_merging[Embedding + transformers Conv1D 1 trainable_tokens-EmbConv1D-TrainableTokensConfig-config_kwargs180] - AssertionError: assert not True`
This test fails because sometimes the gradients of the trainable tokens delta is 0 but only when training on CUDA,
CPU is fine.
This is a weird one and I'm not sure if this is a good fix or not. I encountered this error on two machines
(1xL40S and 4xA10G) and I was not able to pinpoint this to something particular in the environment, i.e.
PEFT version (tested v0.17 to main), transformers version (tested 4.5{5,6,7}, 5.0), CUDA version (tested 12.6, 12.8)
or torch version (tested 2.7, 2.8, 2.9, 2.10). I also set `LD_LIBRARY_PATH=` before running pytest to exclude
cuDNN libraries that come preinstalled on the EC2 instance.
Removing the ReLU in `EmbConv1DModel` as well as boosting the Conv1D weights will fix the error. Replacing
the ReLU with `Threshold(0, 0)` has the same behavior. It depends on the seed, i.e. if the initialization of
`Conv1D` is favorable the bug will not trigger.
I tried pinpointing it on `index_copy` but it is not `index_copy` by itself that is the problem. Maybe we will just
have to live with this?
> `FAILED tests/test_common_gpu.py::PeftGPUCommonTests::test_dora_ephemeral_gpu_offload_multigpu - RuntimeError: Expected all tensors to be on the same device, but got mat2 is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA_mm)`
This is caused by a bug introduced in #2960 - `ephemeral_gpu_offload` is not passed to the variant and therefore
never utilized.
> `FAILED tests/test_gpu_examples.py::PeftBnbGPUExampleTests::test_seq2seq_lm_training_single_gpu - AttributeError: 'T5ForConditionalGeneration' object has no attribute 'hf_device_map'`
This is caused by transformers@315dcbe45cee1489a32fc228a80502b0a150936c which disables accelerate hooks if the
device map only contains one device. I confirmed that just specifying one value moves the model to that device even
without accelerate hook invocation. I also tested having two devices (cpu + cuda:0) and in that case a device map is
present. Therefore this only needs an added `hasattr` check to be compatible with transformers v5.
Co-authored-by: nemo <git@ningu.net>