benchmark
bd90678d - Add model torchrec dlrm (#1397)

Commit
2 years ago
Add model torchrec dlrm (#1397) Summary: Original model file: https://github.com/facebookresearch/dlrm/blob/main/torchrec_dlrm/dlrm_main.py Using the default config in the model file. Need to modify a few places to run on single GPU device, including removing DDP, change device from "meta" to concrete devices, and remove fused optimizer. Pull Request resolved: https://github.com/pytorch/benchmark/pull/1397 Test Plan: Experiment on T4 16 GB ``` $ python run.py torchrec_dlrm -d cuda --bs 4096 -t eval Running eval method from torchrec_dlrm on cuda in eager mode with input batch size 4096. GPU Time: 5.040 milliseconds CPU Total Wall Time: 5.066 milliseconds $ python run.py torchrec_dlrm -d cuda --bs 4096 -t train Running train method from torchrec_dlrm on cuda in eager mode with input batch size 4096. GPU Time: 32.600 milliseconds CPU Total Wall Time: 32.625 milliseconds ``` ``` $ python run.py torchrec_dlrm -d cuda --bs 4096 -t eval --torchdynamo inductor [2023-02-04 14:14:36,265] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices Segmentation fault (core dumped) $ python run.py torchrec_dlrm -d cuda --bs 4096 -t train --torchdynamo inductor [2023-02-04 14:14:36,265] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices Segmentation fault (core dumped) ``` Reviewed By: yf225, brad-mengchi Differential Revision: D43018611 Pulled By: xuzhao9 fbshipit-source-id: 3438535cc0bad2f151fae40305f5ba5e5f990ef6
Author
Parents
Loading