add bfloat16 support for ConcatTraining and SplitTraining ops (#18280)
### Description
<!-- Describe your changes. -->
Updates input/output type constraints on training operators
ConcatTraining and SplitTraining to include bfloat16 which was
introduced in IR version 4.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enabling `meta-llama/Llama-2-70b` to be finetuned with ONNX Runtime
training.
Co-authored-by: Prathik Rao <prathikrao@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>