[flang][openmp] Fix GPU byref reduction descriptor initialization (#178934)
When generating GPU reduction code for arrays passed by reference, only
the base_ptr field was initialized in the shuffled descriptor, leaving
extent, stride, and rank fields uninitialized. This caused garbage
metadata to be passed to user reduction combiners, resulting in
incorrect iteration bounds and crashes on GPU targets.
Fix by copying the entire source descriptor and then updating the
base_ptr to point to thread-private storage. This preserves all metadata
(extents, strides, rank) while correctly pointing to the shuffled data
location.
The fix applies to three reduction helper functions:
- _omp_reduction_shuffle_and_reduce_func (warp-level shuffle)
- _omp_reduction_list_to_global_reduce_func (block-to-global)
- _omp_reduction_global_to_list_copy_func (global-to-block)
Fixes multi-dimensional array reductions on GPU target regions with
teams distribute parallel for directives.
Co-authored-by: Sunil Shrestha <sshrestha@pe28vega.hpc.amslabs.hpecorp.net>