optimize householder product backward to be more memory-efficient (#84627)
A follow-up on discussions in https://github.com/pytorch/pytorch/pull/84180.
Makes backward more memory efficient with the lesser number of kernel calls.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84627
Approved by: https://github.com/kshitij12345, https://github.com/zou3519