pytorch
5c5b2c68 - Simplify copy kernel (#28428)

Commit
5 years ago
Simplify copy kernel (#28428) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28428 Using the new type promotion and dynamic casting added to `TensorIterator`, the copy kernels could be greatly simplified. Benchmark on CUDA: ```python import torch import timeit import pandas import itertools from tqdm.notebook import tqdm import math print(torch.__version__) print() _10M = 10 * 1024 ** 2 d = {} for from_, to in tqdm(itertools.product(torch.testing.get_all_dtypes(), repeat=2)): if from_ not in d: d[from_] = {} a = torch.empty(_10M, dtype=from_, device='cuda') min_ = math.inf for i in range(100): torch.cuda.synchronize() start = timeit.default_timer() a.to(to) torch.cuda.synchronize() end = timeit.default_timer() elapsed = end - start if elapsed < min_: min_ = elapsed d[from_][to] = int(min_ * 1000 * 1000) pandas.DataFrame(d) ``` original: ![image](https://user-images.githubusercontent.com/1032377/67623519-e3e6dd80-f7da-11e9-86ea-9cc9f237123b.png) new: ![image](https://user-images.githubusercontent.com/1032377/67623527-fc56f800-f7da-11e9-82bd-dc1ff9821b68.png) Test Plan: Imported from OSS Differential Revision: D18170995 Pulled By: ezyang fbshipit-source-id: 461b53641813dc6cfa872a094ae917e750c60759
Author
Parents
Loading