pytorch
9456de93 - [dtensor] Fix and improve the sharding cache behavior (#109306)

Commit View On GitHub

Commit

1 year ago

[dtensor] Fix and improve the sharding cache behavior (#109306) resolves https://github.com/pytorch/pytorch/issues/109101 The problem is essentially because we were hashing all the arguments, including the scalar too (i.e. aten.div(tensor, scalar)), in the optimizer, the scalar might change everytime we call the op, thus cache miss everytime we call the op This PR improves the sharding cache behavior by introducing a RuntimeSchemaInfo, used to record some runtime necessary hashing information during op registration time. This enable us to: * only hash arguments that are tensor or have static_argnum, this is to enable many cases like aten.div.Tensor(tensor, 0.23231) hit the cache. as we currently hashing all args which exclude those cases * with the correct cache behavior, optimizers will hit the cache again and resolve the high cpu overhead issue. simple MLP shows all cache hit and for a single addmm -> 0.319ms (from 0.341ms), shows some hashing improvements: <img width="1172" alt="Screenshot 2023-09-14 at 11 06 07 AM" src="https://github.com/pytorch/pytorch/assets/9443650/3406d673-dd8d-4ad9-9b80-9d4721c430e3"> Adam optimizer shows aten.div hit sharding cache again <img width="1016" alt="Screenshot 2023-09-14 at 11 02 10 AM" src="https://github.com/pytorch/pytorch/assets/9443650/4280e8e3-af44-4fc2-8360-ea80b768f1d9"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/109306 Approved by: https://github.com/fduwjj

Author

wanchaol

Committer

pytorchmergebot

Parents

0cbca857

pytorch 9456de93 - [dtensor] Fix and improve the sharding cache behavior (#109306)

Commit

pytorch
9456de93 - [dtensor] Fix and improve the sharding cache behavior (#109306)