[pt2][inductor] search caches by default (#95134)
Summary: attempt two at enabling search of global/local cache, regardless of `max_autotune`, by default. the main problem is that triton template generation seems to be broken in some cases for CI tests (maybe dynamic shapes), but this is going to take more time to figure out. for now, we can just cancel template generation instead of raising an assertion error and filter out those failed templates.
Test Plan: sandcastle + CI
Differential Revision: D43424922
Pull Request resolved: https://github.com/pytorch/pytorch/pull/95134
Approved by: https://github.com/jansel