Add ref for cumsum (#86229)
As noted in the comment, this decomposition may not be as efficient as
specific implementations of it in different backends. Added here to then
benchmark it. Note that this is needed by TorchInductor https://github.com/pytorch/torchdynamo/issues/883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/86229
Approved by: https://github.com/ngimel