X86: Improve cost model of fp16 conversion (#113195)
Improve cost-modeling for x86 __fp16 conversions so the SLPVectorizer
transforms the patterns:
- Override `X86TTIImpl::getStoreMinimumVF` to report a minimum VF of 4 (SSE
register can hold 4xfloat converted/stored to 4xf16) this is necessary as
fp16 stores are neither modeled as trunc-stores nor can we mark direct Xxfp16
stores as legal as we generally expand fp16 operations).
- Add missing cost entries to `X86TTIImpl::getCastInstrCost`
conversion from/to fp16. Note that conversion from f64 to f16 is not
supported by an X86 instruction.