Improve performance of BiasGelu on oneDNN execution provider (#11935)
Improve performance of BiasGelu on OneDNN execution provider
This modifies how BiasGelu is handled by the OneDNN execution provider
by executing the gelu_erf primitive as a postop of the binary_add primitive.
Also fixes extra data copies made when running on GPU.
Signed-off-by: George Nash <george.nash@intel.com>