[inductor] skip foreach kernel for benchmark fusion (#121168)
benchmark fusion currently does not support foreach kernel. If we don't explicitly skip foreach kernels, we end up with exceptions in `codegen_node_schedule` because individual nodes in a foreach kernel may have incompatible shapes from pointwise/reduction perspective.
cc Manman Ren ( @manman-ren ) who reported the issue when turning on benchmark fusion on BertForMaskedLM.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/121168
Approved by: https://github.com/Chillee