Rework Inductor support for collectives. (#99765)
This is done by introducing two new base classes: InPlaceCollectiveKernel and OutOfPlaceCollectiveKernel.
They deal with the differences for when InPlaceHint needs to be used.
Additionally to that, we introduce `has_side_effects` method to buffers that
prevents them from being DCE'd by the scheduduler. This is needed because InPlaceHint
nodes both wrap the inputs and are the outputs, which places no users to the collectives
themselves.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/99765
Approved by: https://github.com/wconstab