Don't #define NUM_THREADS (#67258)
Summary:
PyTorch doesn't compile with the latest `main` branch of cub again. The root cause is, PyTorch defines a macro `NUM_THREADS`, and cub added some code like
```C++
template<...., int NUM_THREADS, ...>
```
and these two mess up with each other.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67258
Reviewed By: albanD
Differential Revision: D31932215
Pulled By: ngimel
fbshipit-source-id: ccdf11e249fbc0b6f654535067a0294037ee7b96