benchmark
7f76813e - Update and fix gnn model factory and models (#2177)

Commit
1 year ago
Update and fix gnn model factory and models (#2177) Summary: This PR deals with a variety of issues around gnn canary models. Currently the models: - sage - gcn - gat are failing directly from the installation step as they are missing the required data file, `sub_reddit.pt`. The PR checks out the data file `Reddit_minimal.tar.gz` from S3 for all these models. It also updates the requirements and installation files, for example importing `pyg_lib`, since running the models without it causes `NeighborSampler` to throw a deprecation warning. Lastly, this PR focuses on the updating of the gnn model factory to be more in line with both `model.py` and [_invoke_staged_train_test()](https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/model.py#L289) as it is a multi batch model. This means that it needed a `forward()`, `backward()`, `optimizer_step()` and `get_input_iter()` function. This would also make it more in line with other model factories such as the vision one. These changes allow the models to be trained with `run.py`: ``` python benchmark/run.py sage -d cpu -t train --metrics model_flops,cpu_peak_mem,ttfb Warning: The model sage cannot be found at core set. Running train method from sage on cpu in eager mode with input batch size 64 and precision fp32. 3054644320 Module FLOP % Total ------------- --------- --------- Global 3054.644M 100.00% - aten.addmm 763.661M 25.00% - aten.mm 2290.983M 75.00% CPU Wall Time per batch: 1.654 milliseconds CPU Wall Time: 201.804 milliseconds Time to first batch: 321.6082 ms Model Flops: 0.0151 TFLOPs per second CPU Peak Memory: 0.3770 GB ``` ``` python benchmark/run.py gat -d cpu -t train --metrics cpu_peak_mem,ttfb Warning: The model gat cannot be found at core set. Running train method from gat on cpu in eager mode with input batch size 64 and precision fp32. CPU Wall Time per batch: 2.721 milliseconds CPU Wall Time: 331.996 milliseconds Time to first batch: 174.6178 ms CPU Peak Memory: 0.3594 GB ``` ``` python benchmark/run.py gcn -d cpu -t train --metrics cpu_peak_mem,ttfb Warning: The model gcn cannot be found at core set. Running train method from gcn on cpu in eager mode with input batch size 64 and precision fp32. CPU Wall Time per batch: 1.795 milliseconds CPU Wall Time: 219.015 milliseconds Time to first batch: 220.0093 ms CPU Peak Memory: 0.3350 GB ``` NOTE: Gat and Gcn cannot collect model_flops metrics because there is a bug when running these models with the FlopCounterMode context manager ([here](https://github.com/pytorch/benchmark/blob/main/torchbenchmark/util/experiment/metrics.py#L106)). NOTE 2: eval is not supported yet as there is no `_invoke_staged_eval_test`() function yet, but this would be a good idea to implement for completion. Pull Request resolved: https://github.com/pytorch/benchmark/pull/2177 Reviewed By: aaronenyeshi Differential Revision: D54204035 Pulled By: xuzhao9 fbshipit-source-id: fab60e1e3db3d1620bce34be30ea922066e06a5a
Author
Parents
Loading