ggml : alternative fix for race condition bug in non-inplace ggml_compute_forward_diag_mask_f32 (#1454)
* fix race condition bug in non-inplace ggml_compute_forward_diag_mask_f32
memcpy needs to be synchronized across threads to avoid race conditions.
=> do it in INIT phase
* remove trailing whitespace
* Update ggml.c
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>