[inductor][fx passes] add tensor size limit for group fusion and enable batch fusion (#106627)
Summary:
Add threshhold for max size. if tensor size> threshold, we will not fuse them.
Enable batch_fusion by default since we have found consistent qps gain and ne neutral.
Test Plan:
Some local test result in: https://docs.google.com/document/d/1-qNuvGejhGgwKmRVTbz98_-SVu_fMoKgFcxyxrNMH_M/edit
4096 should be a better threshold for ads cmf model.
f465511761
f465519705
4.8% qps gain
{F1064213077}
ne neutral
{F1064214423}
Reviewed By: yanboliang
Differential Revision: D48042826
Pull Request resolved: https://github.com/pytorch/pytorch/pull/106627
Approved by: https://github.com/jansel