[quant][fx][perf] improve runtime of prepare step for large models (#61132)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61132
For large models, the insert_observers_for_model function was taking a long time, especially for the case where not all the nodes are being quantized
For example for a model with 21000 nodes of which only ~50 are being quantized the breakdown of prepare_fx vs convert fx was
prepare_fx 979 seconds
convert_fx 9 seconds
The main reason was because we were doing some unnecessary computation for all nodes in this function, this PR just moves them to where they are actually used
After this PR
prepare_fx 26 seconds
convert_fx 9 seconds
Test Plan:
Existing tests
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D29522303
fbshipit-source-id: 7ce12582a859d02ff763abebf4a592d28e0764ca