Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head))" (#22444)
Revert "Error (also in original) model, scaling only q matrix not qk.T dot product (qk.T/sqrt(dim_per_head)) (#21627)"
This reverts commit bad83008377bf01a34ac2e08c74e7da89eaf4e07.