add mobile friendly at:parallel_for backend
Summary:
This diff implemented at::parallel_for()/parallel_reduce() and other
ATen/Parallel.h APIs for mobile using caffe2::ThreadPool.
caffe2::ThreadPool doesn't support submitting individual tasks
separately and running them in parallel - all tasks need to be submit in
one batch which will lock the thread pool until all of them finish - as a
result we didn't wrap caffe2::ThreadPool with TaskThreadPoolBase interface
and reuse at::parallel_for() implementation in ParallelNative.h. Because
of this constraint, intraop_launch() / intraop_launch_future() are not
supported yet.
This diff doesn't touch inter-ops pool - it's still default native c10
thread pool. Will work on it when it's widely used.
Test Plan: - This is early draft to receive feedback. Will do more thorough tests.
Differential Revision: D17543412
Pulled By: ljk53
fbshipit-source-id: 53a3259409c7207d837b9135d87d8daa6ad15e30