Disable the FA backend for SDPA on AMD GPUs #30850

ydshieh merged 4 commits into huggingface:main from mht-sharma:update_sdpa_rocm

mht-sharma363 days ago

What does this PR do?

Garbage values may occur during model generation with models like LLama, Mistral, and Mixtral, particularly when utilizing multi-gpu setup and device_map=auto alongside SDPA and FA.

The PR disables the FA on SDPA for ROCM devices

disable fa

64664da8

disable fa

53c438e6

update warning

64bd948d

mht-sharma requested a review from

fxmarty 363 days ago

mht-sharma363 days ago

cc @SunMarc @ydshieh

mht-sharma marked this pull request as ready for review 363 days ago

fxmarty commented on 2024-05-16

src/transformers/modeling_utils.py

1623	1624
1624	1625	@classmethod
1625		def _check_and_enable_sdpa(cls, config, hard_check_only: bool = False) -> PretrainedConfig:
	1626	def _check_and_enable_sdpa(

fxmarty363 days ago

Can you update as well

./src/transformers/models/idefics/modeling_idefics.py:954:    # Adapted from transformers.modeling_utils.PreTrainedModel._check_and_enable_sdpa
./src/transformers/models/idefics/modeling_idefics.py:956:    def _check_and_enable_sdpa(cls, config, hard_check_only: bool = False) -> PretrainedConfig:
./src/transformers/models/falcon/modeling_falcon.py:954:    # Adapted from transformers.modeling_utils.PreTrainedModel._check_and_enable_sdpa
./src/transformers/models/falcon/modeling_falcon.py:956:    def _check_and_enable_sdpa(cls, config, hard_check_only: bool = False) -> "PretrainedConfig":

mht-sharma363 days ago

If this function is in multiple files, I guess it makes sense to handle the logic in _autoset_attn_implementation

fxmarty363 days ago (edited 363 days ago)

You changed the signature hence my suggestion, but I see you updated since then

Conversation is marked as resolved

Show resolved

src/transformers/modeling_utils.py

	1655
	1656		if torch.version.hip is not None and config._attn_implementation == "sdpa" and device_map == "auto":
	1657		logger.warning_once(
	1658		"Using the `SDPA` attention implementation with `device_map='auto'` on a ROCM device may lead to performance issues due to the FA backend. Disabling it to use alternative backends."

fxmarty363 days ago

Did not have time for this, but ideally we should have a pytorch issue open with a repro without transformers, and link it here

ydshieh363 days ago

I agree. But not sure if it is also bind to auto in transformers.

ydshieh363 days ago

Also, it might be good to put all of these into a AMD documentation, so we can share with them.

HuggingFaceDocBuilderDev363 days ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh approved these changes on 2024-05-16

ydshieh363 days ago

Works for me overall. Just as @fxmarty mentioned, 2 places need to be updated too. I will leave you two to come up with a conclusion for that.

ping me again once you think a final review is necessary (i.e. the changes are big enough than current one)

update warning

d90d45b1

fxmarty approved these changes on 2024-05-16

ydshieh363 days ago👍 1

Ready for a merge @mht-sharma ?

ydshieh merged 0753134f into main 363 days ago

Reviewers

ydshieh

fxmarty

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

1650	1653	if not hard_check_only:
1651	1654	config._attn_implementation = "sdpa"
	1655
	1656	if torch.version.hip is not None and config._attn_implementation == "sdpa" and device_map == "auto":

transformers Disable the FA backend for SDPA on AMD GPUs #30850 Merged

Disable the FA backend for SDPA on AMD GPUs #30850

What does this PR do?

transformers
Disable the FA backend for SDPA on AMD GPUs
#30850

Merged