SemanticDiff

pytorch
f89a762a - Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) (#86516)

Commit View On GitHub

Login via GitHub
Home
Pricing
FAQ
Install

Login via GitHub

Commit

1 year ago

Fix the performance issue that the for-loop before ExternallCall could not be parallelized. (#85056) (#86516) Currently, NNC only parallelizes the loop statement of the graph outputs. The logic could bypass some loop statements that could be parallelized. Take an example as follows and suppose the output of `ExternallCall` is also the output of NNC fusion group. Current [parallel logic](https://github.com/pytorch/pytorch/pull/85056/files#diff-9a11174c26e4b57ab73e819520122bc314467c72962f3a5b79e7400ea3c4bbe5L781-L785) only tries to parallel the `ExternalCall` and bypass `stmt1` and `stmt2`. ```c++ stmt1: For: stmt2: For: stmt3: ExternalCall ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/85056 Approved by: https://github.com/frank-wei, https://github.com/bertmaher

References

#86516 - Fix the performance issue that the for-loop before ExternallCall

Author

EikanWang

EikanWang

Parents

FAQ Terms Privacy Refunds Impressum

Loading