[PyTorch Edge] Cache operator lambda during model loading [7% faster model loading] (#61996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61996
A recent post https://fb.workplace.com/groups/pytorch.edge.users/posts/2012215235600341/ about slow model loading with an accompanying perf report (report.html) caused me to look at the report and find hot spots during model loading. This suggested that we spend quite a bit of time looking up operators from the dispatcher. This means that we can probably just cach the operator handler functions (instead of computing them every time the operator name shows up since it potentially shows up multiple times in a given model).
This diff results in an approx 7% speedup in model loading time (from [315ms](https://www.internalfb.com/intern/aibench/details/45077128343028) to [293ms](https://www.internalfb.com/intern/aibench/details/600870874797229)) when run against an 87MB speech model that jiatongzhou provided.
See https://fb.workplace.com/groups/pytorch.dev/posts/855724575006024/ for the previous post from jiatongzhou.
ghstack-source-id: 134634612
Test Plan:
Run using AI Bench.
### Speech Transducer v25 model (87MiB)
Followed up with jiatongzhou and he gave me his speech model. For posterity, here's how to fetch it (you don't need to since I uploaded it to NMLML and now has a permanent Everstore Handle):
```
cd /tmp/
mkdir speech_model
cd speech_model
fbpkg fetch speech.stella.neural_transducer.on_device.en_us:25
cp pytorchmodel.pt ~/speech_transducer_v25_pytorchmodel.ptl
```
Here's how to build and run the benchmark using AI Bench:
```
buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote
```
Reviewed By: raziel
Differential Revision: D29826210
fbshipit-source-id: 134b67eb466e73f0e43447b9b966278f13c4b56f