expose fast get_current_stream (#78165)
Expose fast no-frills version of getting raw `cudaStream_t` in python (200 ns instead of 4 us)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78165
Approved by: https://github.com/SherlockNoMad, https://github.com/soumith, https://github.com/gchanan