accelerate
acfbf72a - Give example on how to handle gradient accumulation with cross-entropy (#3193)

Commit

1 year ago

Give example on how to handle gradient accumulation with cross-entropy (#3193) * Add cross-entropy example in the gradient accumulation docs * add example of logs * correct skeleton code * replace gather_for_metrics with gather * batch_size -> per_device_batch_size * remove main_process_only=True * add autoregressive example in examples/ * Update docs/source/usage_guides/gradient_accumulation.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * ruff format * add grad accum test * update docs * Update examples/by_feature/gradient_accumulation_for_autoregressive_models.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * update tests --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>

References

#3193 - Give example on how to handle gradient accumulation with cross-entropy

Author

ylacombe

Parents

200c9eb7

accelerate acfbf72a - Give example on how to handle gradient accumulation with cross-entropy (#3193)

accelerate
acfbf72a - Give example on how to handle gradient accumulation with cross-entropy (#3193)