Add A Pass to Fold Tensors With a Uniform Value, match sdpa on a few models (#103600)
Adds a Constant Folding pass to the joint graph only targeting tensors which can be replaced with a single value, and then removes no-ops from the graph. This allows us to match sdpa in BertForMaskedLM, AlbertForMaskedLM, and LayoutLMForMaskedLM.
BertForMaskedLM
Perf: 1.6853 -> 1.933, Memory: 0.9462 -> 1.41
AlbertForMaskedLM
Perf: 1.6620 -> 1.761, Memory: 1.257 -> 1.94
LayoutLMForMaskedLM
Perf: (non cudagraphs) 1.6991 -> 1.939x, Memory: 0.9624 -> 1.50
MobileBertForMaskedLM
Perf: 1.864x -> 1.941x, Memory: 0.94 -> 1.03
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103600
Approved by: https://github.com/jansel