accelerate
Fix DataLoader sharding for deepspeed in accelerate
#315
Merged

Fix DataLoader sharding for deepspeed in accelerate #315

m3rlin45
m3rlin453 years ago

Summary

I noticed that my training using the deepspeed integration in accelerate had some strange behavior, the number of iterations per epoch didn't go down as I increased number of GPUs. Also loss wasn't converging as expected.

After a bunch of debugging, I found that _prepare_deepspeed(...) doesn't appear to call _prepare_one(...) properly. It calls without setting first_pass=True, which means that prepare_one(...) skips wrapping the DataLoaders... defeating the whole point

How I tested

I added logging to my training flow to print out len(data_loader) after accelerator.prepare(...) is called.

I validated that with this fix, the length is divided by the number of processes, as expected.

m3rlin45 set first_pass on calls from deepspeed to _prepare_one(...) so that i…
68093042
sgugger
sgugger approved these changes on 2022-04-13
sgugger3 years ago (edited 3 years ago)šŸ‘ 1

Very nice catch, thanks a lot for fixing!

Can you just run make style on your branch to fix the formatting issue?

HuggingFaceDocBuilderDev
HuggingFaceDocBuilderDev3 years ago (edited 3 years ago)

The documentation is not available anymore as the PR was closed or merged.

m3rlin45 fixed style
e293f7d3
sgugger sgugger merged 381ae200 into main 3 years ago

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone