Performance improvment to cumulative seq len (#87530)
# Summary
Performance improvement to calculating metadata needed for gluing in nested tensors to fused kernels.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/87530
Approved by: https://github.com/cpuhrsch