benchmark
c26fb1ea - Add initial cpu userbenchmark for torchbench (#1559)

Commit
2 years ago
Add initial cpu userbenchmark for torchbench (#1559) Summary: Add initial cpu userbenchmark for torchbench Works for Roadmap https://github.com/pytorch/benchmark/issues/1293 for cpu userbenchmark extend with below functions. - [x] Add core binding option, support multi-instances test. - [x] Add gomp/iomp option. - [x] Add memory allocator option. - [x] Support all enabled cpu features test based on torchbench models, e.g. channels-last / fx_int8 / jit with fusers - [x] Support latency and cpu_peak_mem metrics for now, will extend to fps-like report - [x] Add `README.md` For example, in below cml, we tested 2 models fx_int8 inference with batch size 8 on CLX socket 0 and 4 instances at the same time. ```shell $ python run_benchmark.py cpu --model resnet50,alexnet --test eval -b 8 --precision fx_int8 --launcher --launcher-args "--node-id 0 --ninstances 4" Running benchmark: ['/localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python', '-m', 'torch.backends.xeon.run_cpu', '--node-id', '0', '--ninstances', '4', '/localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py', '-m', 'resnet50', '--device', 'cpu', '-b', '8', '-t', 'eval', '-o', PosixPath('/localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336')] 2023-04-20 00:43:37,960 - __main__ - INFO - Use JeMalloc memory allocator 2023-04-20 00:43:37,960 - __main__ - INFO - OMP_NUM_THREADS=7 2023-04-20 00:43:37,960 - __main__ - INFO - Using Intel OpenMP 2023-04-20 00:43:37,960 - __main__ - INFO - KMP_AFFINITY=granularity=fine,compact,1,0 2023-04-20 00:43:37,960 - __main__ - INFO - KMP_BLOCKTIME=1 2023-04-20 00:43:37,960 - __main__ - INFO - LD_PRELOAD=/localdisk/chuanqiw/miniconda3/envs/torchdynamo/lib/libiomp5.so:/localdisk/chuanqiw/miniconda3/envs/torchdynamo/lib/libjemalloc.so 2023-04-20 00:43:37,960 - __main__ - INFO - numactl -C 0-6 -m 0 /localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python -u /localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py -m resnet50 --device cpu -b 8 -t eval -o /localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336 2023-04-20 00:43:37,960 - __main__ - INFO - numactl -C 7-13 -m 0 /localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python -u /localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py -m resnet50 --device cpu -b 8 -t eval -o /localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336 2023-04-20 00:43:37,960 - __main__ - INFO - numactl -C 14-20 -m 0 /localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python -u /localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py -m resnet50 --device cpu -b 8 -t eval -o /localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336 2023-04-20 00:43:37,960 - __main__ - INFO - numactl -C 21-27 -m 0 /localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python -u /localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py -m resnet50 --device cpu -b 8 -t eval -o /localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336 Running TorchBenchModelConfig(name='resnet50', device='cpu', test='eval', batch_size=8, jit=False, extra_args=[], extra_env=None) ...Running TorchBenchModelConfig(name='resnet50', device='cpu', test='eval', batch_size=8, jit=False, extra_args=[], extra_env=None) ...Running TorchBenchModelConfig(name='resnet50', device='cpu', test='eval', batch_size=8, jit=False, extra_args=[], extra_env=None) ...Running TorchBenchModelConfig(name='resnet50', device='cpu', test='eval', batch_size=8, jit=False, extra_args=[], extra_env=None) ... [Done] [Done] [Done] [Done] Running benchmark: ['/localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python', '-m', 'torch.backends.xeon.run_cpu', '--node-id', '0', '--ninstances', '4', '/localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py', '-m', 'alexnet', '--device', 'cpu', '-b', '8', '-t', 'eval', '-o', PosixPath('/localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336')] 2023-04-20 00:43:53,444 - __main__ - INFO - Use JeMalloc memory allocator 2023-04-20 00:43:53,444 - __main__ - INFO - OMP_NUM_THREADS=7 2023-04-20 00:43:53,444 - __main__ - INFO - Using Intel OpenMP 2023-04-20 00:43:53,444 - __main__ - INFO - KMP_AFFINITY=granularity=fine,compact,1,0 2023-04-20 00:43:53,444 - __main__ - INFO - KMP_BLOCKTIME=1 2023-04-20 00:43:53,444 - __main__ - INFO - LD_PRELOAD=/localdisk/chuanqiw/miniconda3/envs/torchdynamo/lib/libiomp5.so:/localdisk/chuanqiw/miniconda3/envs/torchdynamo/lib/libjemalloc.so 2023-04-20 00:43:53,445 - __main__ - INFO - numactl -C 0-6 -m 0 /localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python -u /localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py -m alexnet --device cpu -b 8 -t eval -o /localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336 2023-04-20 00:43:53,445 - __main__ - INFO - numactl -C 7-13 -m 0 /localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python -u /localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py -m alexnet --device cpu -b 8 -t eval -o /localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336 2023-04-20 00:43:53,445 - __main__ - INFO - numactl -C 14-20 -m 0 /localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python -u /localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py -m alexnet --device cpu -b 8 -t eval -o /localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336 2023-04-20 00:43:53,445 - __main__ - INFO - numactl -C 21-27 -m 0 /localdisk/chuanqiw/miniconda3/envs/torchdynamo/bin/python -u /localdisk/chuanqiw/PT/benchmark/userbenchmark/cpu/run_config.py -m alexnet --device cpu -b 8 -t eval -o /localdisk/chuanqiw/PT/benchmark/.userbenchmark/cpu/cpu-20230420004336 Running TorchBenchModelConfig(name='alexnet', device='cpu', test='eval', batch_size=8, jit=False, extra_args=[], extra_env=None) ...Running TorchBenchModelConfig(name='alexnet', device='cpu', test='eval', batch_size=8, jit=False, extra_args=[], extra_env=None) ...Running TorchBenchModelConfig(name='alexnet', device='cpu', test='eval', batch_size=8, jit=False, extra_args=[], extra_env=None) ...Running TorchBenchModelConfig(name='alexnet', device='cpu', test='eval', batch_size=8, jit=False, extra_args=[], extra_env=None) ... [Done] [Done] [Done] [Done] ``` We can find the test results in `.userbenchmark/cpu/cpu-20230420004336`, `cpu` userbenchmark will create a subfolder for each test, and aggregate all test results into `metrics-20230420004336.json`. For each sub-folder, it contains instances logs named with instance PID for that model test. ```shell $ ls .userbenchmark/cpu/cpu-20230420004336 eval_alexnet_eager/ eval_resnet50_eager/ $ ls .userbenchmark/cpu/cpu-20230420004336/eval_alexnet_eager/ metrics-3347653.json metrics-3347654.json metrics-3347655.json metrics-3347656.json $ cat .userbenchmark/cpu/metrics-20230420004336.json { "name": "cpu", "environ": { "pytorch_git_version": "de1114554c38322273c066c091d455519d45472d" }, "metrics": { "alexnet-eval-eager_latency": 58.309660750000006, "alexnet-eval-eager_cmem": 0.416259765625, "resnet50-eval-eager_latency": 335.04970325, "resnet50-eval-eager_cmem": 0.90673828125 } } ``` Pull Request resolved: https://github.com/pytorch/benchmark/pull/1559 Reviewed By: aaronenyeshi Differential Revision: D45450175 Pulled By: xuzhao9 fbshipit-source-id: 8e7528f4d694eae182ee601cd80bc6e57cd14e3c
Author
Parents
Loading