accelerate
Fix DataLoader sharding for deepspeed in accelerate
#315

Merged

Fix DataLoader sharding for deepspeed in accelerate #315

sgugger merged 2 commits into huggingface:main from m3rlin45:fix_deepspeed_data_sharding

m3rlin453 years ago

Summary

I noticed that my training using the deepspeed integration in accelerate had some strange behavior, the number of iterations per epoch didn't go down as I increased number of GPUs. Also loss wasn't converging as expected.

After a bunch of debugging, I found that _prepare_deepspeed(...) doesn't appear to call _prepare_one(...) properly. It calls without setting first_pass=True, which means that prepare_one(...) skips wrapping the DataLoaders... defeating the whole point

How I tested

I added logging to my training flow to print out len(data_loader) after accelerator.prepare(...) is called.

I validated that with this fix, the length is divided by the number of processes, as expected.

set first_pass on calls from deepspeed to _prepare_one(...) so that i…

68093042

sgugger approved these changes on 2022-04-13

sgugger3 years ago (edited 3 years ago)👍 1

Very nice catch, thanks a lot for fixing!

Can you just run make style on your branch to fix the formatting issue?

HuggingFaceDocBuilderDev3 years ago (edited 3 years ago)

The documentation is not available anymore as the PR was closed or merged.

fixed style

e293f7d3

sgugger merged 381ae200 into main 3 years ago

Reviewers

sgugger

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

accelerate Fix DataLoader sharding for deepspeed in accelerate #315 Merged

Fix DataLoader sharding for deepspeed in accelerate #315

Summary

How I tested

accelerate
Fix DataLoader sharding for deepspeed in accelerate
#315

Merged