[torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44442
I noticed lock contention on startup as lookupByLiteral() was
calling registerPendingOperators() - some calls were holding the
lock for 10+ ms, as operators were being registered.
canonicalSchemaString() was using ostreamstring, which isn't typically
particularly fast (partly because of c++ spec locale requirements).
If we repalce with regular c++ string appends, it's somewhat faster
(which isn't hard when comparing with stringstream; albeit a bit
more codegen)
Over the first minute or so, this cuts out 1.4 seconds under the
OperatorRegistry lock (as part of registerPendingOperators) in the
first couple minutes of run time (mostly front-loaded) when running
sync sgd.
As an example, before:
registerPendingOperators 12688 usec for 2449 operators
After:
registerPendingOperators 6853 usec for 2449 operators
ghstack-source-id: 111862971
Test Plan: buck test mode/dev-nosan caffe2/test/cpp/...
Reviewed By: ailzhang
Differential Revision: D23614515
fbshipit-source-id: e712f9dac5bca0b1876e11fb8f0850402f03873a