[NNC] Implementing LoopFusion (#54461)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54337
This PR adds a new API to NNC to perform loop fusion.
```
static For* fuseLoops(const std::vector<For*>& loops);
```
Loop fusion is done only when all the conditions below are satisfied.
* All the loops have the same parent.
* There are no statements between these loops in their parent body.
* The start bounds are the same for all loops.
* The stop bounds are the same for all loops.
* Fusing the loops does not violate or add any dependencies.
This PR also adds an API to check for partial overlaps in `buffer_inference.h` and fixes a bug in `mem_dependency_checker.cpp`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54461
Reviewed By: bertmaher
Differential Revision: D27254888
Pulled By: navahgar
fbshipit-source-id: c21b027d738e5022e9cb88f6f72cd9e255bdb15e