Load CPU MLModel first, and configured MLModel async (#80941)
Summary:
MLModel loads much faster when compute units are set to CPU only. It seems when loading with compute units set to all a large amount of preprocessing work is done during init.
So, in order to speed up our effect load time, load a cpu MLModel synchronously, and a configured MLModel asyncronously. When the second model finishes loading about 600 ms later, swap the models out.
So, for about half a second inference will occur on the cpu, but after that will kick over to gpu or npu.
On iPhone 12 I'm seeing a > 10x improvement in load time as recorded by RenderTimeLogger.cpp
Test Plan:
- Add an override to https://www.internalfb.com/intern/qe2/ig_ios_person_segmentation_universe to opt into the coreml segmentation model
- Launch IG camera and apply an effect that uses segmentation, such as green screen
- Confirm that segmentation works.
https://pxl.cl/277JL
Reviewed By: kimishpatel
Differential Revision: D37597965
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80941
Approved by: https://github.com/mcr229, https://github.com/kimishpatel