Do not use `thrust::lower_bound` on device (#80746)
Summary: As it is broken, see https://github.com/NVIDIA/thrust/issues/1734
Implementation of `c10::cuda::lower_bound` inspired by the one found in `aten/src/ATen/native/cuda/Bucketization.cu`
Test Plan: CI
Reviewed By: ngimel
Differential Revision: D37558845
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80746
Approved by: https://github.com/ngimel