Speed up caching of subtype checks (#12539)
This is very performance critical. Implement a few micro-optimizations
to speed caching a bit. In particular, we use dict.get to reduce the
number of dict lookups required, and avoid tuple concatenation which
tends to be a bit slow, as it has to construct temporary objects.
It would probably be even better to avoid using tuples as keys
altogether. This could be a reasonable follow-up improvement.
Avoid caching if last known value is set, since it reduces the
likelihood of cache hits a lot, because the space of literal values
is big (essentially infinite).
Also make the global strict_optional attribute an instance-level
attribute for faster access, as we might now use it more frequently.
I extracted the cached subtype check code into a microbenchmark
and the new implementation seems about twice as fast (in an
artificial setting, though).
Work on #12526 (but should generally make things a little better).