[X86] Narrow BT/BTC/BTR/BTS compare + RMW patterns on very large integers (#165540)
This patch allows us to narrow single bit-test/twiddle operations for
larger than legal scalar integers to efficiently operate just on the i32
sub-integer block actually affected.
The BITOP(X,SHL(1,IDX)) patterns are split, with the IDX used to access
the specific i32 block as well as specific bit within that block.
BT comparisons are relatively simple, and builds on the truncated
shifted loads fold from #165266.
BTC/BTR/BTS bit twiddling patterns need to match the entire RMW pattern
to safely confirm only one block is affected, but a similar approach is
taken and creates codegen that should allow us to further merge with
matching BT opcodes in a future patch (see #165291).
The resulting codegen is notably more efficient than the heavily
micro-coded memory folded variants of BT/BTC/BTR/BTS.
There is still some work to improve the bit insert 'init' patterns
included in bittest-big-integer.ll but I'm expecting this to be a
straightforward future extension.
Fixes #164225