Make GPU loops support mutable lambda (#35015)
Summary:
I will need it for https://github.com/pytorch/pytorch/pull/34004
The `mutable` qualifier allows a lambda to capture some values, and modify its own copy. This would be useful for random kernels: we capture a `state` of RNG, initialize it when it first run, and the initialized stated will be used later:
```C++
gpu_kernel(iter, [state, initialized](scalar_t arg) mutable -> scalar_t {
if (!initialized) {
curand_init(..., state);
initialized = true;
}
return some_math(curand_uniform(state), arg);
}
```
The `operator()` of `mutable` lambda is not `const`, so we can not pass it as constant reference. It can not be called inside a non-`mutable` lambda either.
Example usage:
```C++
auto t = at::empty({4096}, kCUDA);
float thread_work_index_ = 0;
auto iter = TensorIterator::nullary_op(t);
gpu_kernel(iter, [thread_work_index_]GPU_LAMBDA() mutable -> float {
return thread_work_index_++;
});
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/35015
Differential Revision: D20624698
Pulled By: ngimel
fbshipit-source-id: 06e3987793451cd514181d20252510297e2d28a9