benchmark
8c83aa2b - Increase DenseNet121 Batch Size for better Utilization (#496)

Commit
4 years ago
Increase DenseNet121 Batch Size for better Utilization (#496) Summary: DenseNet121 in the initial research paper (https://arxiv.org/pdf/1608.06993.pdf) uses major dataset ImageNet with the shape (3, 224, 224) and batch size 256. To match this benchmark's profile closer to the community, increase the batch size to 256. Here are experimental inference numbers on A100 (40GB GPU Memory): Batch Size | GPU Time (ms) | CPU Dispatch Time (s) | CPU Total Time (s) | Time Increase to last BS | Notes -- | -- | -- | -- | -- | -- 16 | 46.71795 | 0.04666 | 0.04672 | 0.00% | Overhead hides GPU work, very idle 32 | 49.71725 | 0.04963 | 0.04973 | 6.42% | 64 | 54.9376 | 0.05488 | 0.05496 | 10.50% | 128 | 74.36391 | 0.04987 | 0.07437 | 35.36% | 256 | 144.27956 | 0.05129 | 0.14429 | 94.02% | Best Batch Size 512 | | | | | RuntimeError: Unable to find a valid cuDNN algorithm to run convolution 1024 | | | | | RuntimeError: CUDA out of memory. Tried to allocate 2.30 GiB (GPU 0; 39.59 GiB total capacity; 35.85 GiB already allocated; 1.83 GiB free; 35.87 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Pull Request resolved: https://github.com/pytorch/benchmark/pull/496 Reviewed By: xuzhao9 Differential Revision: D31697847 Pulled By: aaronenyeshi fbshipit-source-id: d0fbe98c66524a6a1de5b07a404c372aeae518bf
Author
Parents
Loading