[OpenBLAS_jll] Update to new build with BFloat16 kernels (#53059)
This also
* drops a patch (`deps/patches/neoverse-generic-kernels.patch`) not
needed anymore for an [old
bug](https://github.com/OpenMathLib/OpenBLAS/issues/2998) fixed upstream
in OpenBLAS. This results in ~5x speedup in the computation of
`BLAS.nrm2` (and hence `LinearAlgebra.norm` for vectors longer than
`LinearAlgebra.NRM2_CUTOFF` (== 32) elements) when the neoversen1
kernels are used, e.g. by default on all Apple Silicon CPUs
* adds a regression test for the above bug
* updates other patches when building openblas from source
Corresponding PR in Yggdrasil:
https://github.com/JuliaPackaging/Yggdrasil/pull/7202.