Eliminate type dispatch from copy_kernel, and use memcpy directly rather than implementing our own copy. (#19198)
Summary:
It turns out that copying bytes is the same no matter what type
they're interpreted as, and memcpy is already vectorized on every
platform of note. Paring this down to the simplest implementation
saves just over 4KB off libtorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/19198
Differential Revision: D14922656
Pulled By: resistor
fbshipit-source-id: bb03899dd8f6b857847b822061e7aeb18c19e7b4