Add an autotune cache for inductor generated kernels (#120963)
Summary: Inductor currently has a best config cache for kernels that it generates. This is a local cache done via writing to the file system. This diff takes this local cache to remote by reusing the existing triton caching mechanism built via Memcache internally and Redis externally.
Test Plan:
tested locally using `TORCH_INDUCTOR_AUTOTUNE_REMOTE_CACHE =1`
Look at scuba to verify the local testing: https://fburl.com/scuba/triton_remote_cache/z6pypznk
The plan is to land this diff with this turned off and gradually introduce this.
Differential Revision: D54398076
Pull Request resolved: https://github.com/pytorch/pytorch/pull/120963
Approved by: https://github.com/jansel