metal : add f16 and bf16 support for concat operator (#24724)
* metal : add f16 and bf16 support for concat operator
Extend the Metal backend concat operator to support f16 and bf16 tensor
types in addition to the existing f32 and i32 support.
- Template kernel_concat on type T with specializations for float, half,
bfloat, and int
- Add type-specific pipeline getter ggml_metal_library_get_pipeline_concat()
- Update device support check to allow f16 unconditionally and bf16 when
device supports bfloat16
- Update dispatch to select the correct kernel specialization by type
Assisted-by: pi:llama.cpp/Qwen3.6-27B
* metal : extend concat operator to support f16, bf16, i8, i16 and i64
Assisted-by: pi:llama.cpp/Qwen3.6-27B