add std::ostream& operator<< for BFloat16 in BFloat16.h (#121302)
This PR Move `operator<<` of `BFloat16` to `BFloat16.h`.
Previously, this function is in `TensorDataContainer.h`. If need `std::cout` a `BFloat16` variable when debugging, `TensorDataContainer.h` have to be included. This is inconvient and counterintuitive.
Other dtypes such as `Half`, define their `operator<<` in headers where they are defined such as `Half.h`. Therefore, I think it makes more sense to move `operator<<` of `BFloat16` to `BFloat16.h`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121302
Approved by: https://github.com/ezyang