[AMDGPU] Cost of i8 vector insert/extract is free in some cases (#194991)
Reduce the cost of i8 vector insert and extract elements to avoid
scalarization in VectorCombine.
It is impossible to know during VectorCombine if an extract element will
require additional instructions or be free. There is a lot of additional context
needed to make that assessment. For example, what instructions are using
the extract elements or what other extract element index values occur. This
patch chooses some cases that likely do not require instructions, which
reduces the overall cost and avoids scalarization. Because of this chance, there
are SLP vectorization opportunities that are missed. In general, those missed
SLP vectorization cases require scalarization during code generation, and the
compiler ends up generating the same code with and without SLP vectorization.