[1/2] Intel GPU Runtime Upstreaming for Generator (#118528)
# Motivation
As mentioned in [[RFC] Intel GPU Runtime Upstreaming](https://github.com/pytorch/pytorch/issues/114842), the last runtime component we would like to upstream is `Generator` which is responsible for the pseudo-random number generation. To facilitate the code review, we split the code changes into 2 PRs. This is one of the 2 PRs and covers the changes under `aten`.
# Design
Following the previous design, `c10::GeneratorImpl` is the device-agnostic abstraction of a random number generator. So we will introduce an XPU generator `XPUGeneratorImpl`, inheriting from `c10::GeneratorImpl`, to manage random states on an Intel GPU device. Intel GPU runtime `Generator` adopts the same algorithm as CPU. The corresponding C++ file should be placed in aten/src/ATen/xpu/ folder and is built in `libtorch_xpu.so`.
This PR provide the list of APIs:
- `getDefaultXPUGenerator`
- `createXPUGenerator`
# Additional Context
The 2nd PR will cover `python frontend`.
The differences with CUDA:
The generator-related ATen CPP APIs are 1:1 mapping with CUDA.
The XPUGeneratorImpl's member functions have slight differences with CUDA.
lack of CUDA-related counterpart APIs listed below:
- capture_prologue
- capture_epilogue
- philox_cuda_state
- reset_rnn_state
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118528
Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD