[JIT] optimize autodiff subgraph slicing (#41437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41437
[copied from commented code]
the IR has many nodes which can never be reordered around, such as a
prim::Bailout. if a node N is surrounded by two nodes which cannot be
reordered, A and B, then a differentiable subgraph that is created from N
can only contain nodes from [A, B] The nodes from A to B represent one
work block for the subgraph slicer to work on. By creating these up
front, we avoid retraversing the whole graph block any time scanNode
returns, and we can also avoid attempting to create differentiable
subgraphs in work blocks that do not contain a minimum number of differentiable nodes
This improved compilation time of e of densenet (the model with the slowest compilation time we're tracking) from 56s -> 28s, and for mobilenet from 8s -> 6s.
Test Plan: Imported from OSS
Reviewed By: Krovatkin, ZolotukhinM
Differential Revision: D22600607
Pulled By: eellison
fbshipit-source-id: e5ab6ed87bf6820b4e22c86eabafd9d17bf7cedc