Merge benchmark scripts into the main branch (#5913)
* Adding the benchmarking with TorchBench (#5788)
* Initial commit with dummy model benchmark
* add XRT support
* Add torchbench benchmark models
* add randomize_input
* add model set up for torchbench model
* update ExperimentLoader
* Add saving results
* minor args update
* update style
* add experiment name
* add grad context for eval and train
* minor user config update
* fix train() return item
* minor refactor
* add dynamo options
* add column in result for dynamo setting
* using to capture output and error
* Fix some failure cases for dynamo
* reduce eval result size by returning eval loss
* minor refactor
* revert eval result change
* minor fix
* Change output format to jsonl
* Add accelerator model nname
* add skipping finished experiments
* main process needs to remove PJRT_DEVICE env var that is automatically added
* Add a simple result analyzer
* Result analyzer save to database csv with historical data
* Handle detectron2 models
* minor update
* add deny list
* Create run_benchmark
* Rename run_benchmark to run_benchmark.sh
* Fix device names and dynamo backend names in benchmark runner (#5806)
* update optimizer for openxla
* Add benchmark selection by tier 1-3 (#5808)
* Apply Pytorch/XLA formatting style (#5816)
* Add top tier benchmark runner (#5809)
* Add profiling capabilities to experiment_runner.py script (#5812)
* update run model config call interface, optimizer and result analyze script
* update dependency errir
* Add profiling capabilties
---------
Co-authored-by: zpcore <zpcore@gmail.com>
* benchmarks: add script to aggregate results from result_analyzer (#5829)
* benchmarks: extract tiers into their own file
So that they can be reused in other files. The second user is coming
next.
* benchmarks: add aggregate.py
This script processes output CSV files from results_analyzer to
generate CSV/plots. Example:
$ for fmt in csv png; do \
for acc in v100 a6000; do \
for report in latest histogram speedup; do \
for test in training inference; do \
FILENAME=/tmp/png/$acc-$test-$report.$fmt; \
python3 aggregate.py \
--accelerator=$acc \
--test=$test \
-i /tmp/csv-depot \
--report=$report \
--title="All benchmarks" \
--format=$fmt > $FILENAME || break; \
chmod 644 $FILENAME; \
done; \
done; \
done; \
done
This generates plots and CSV files to summarize the latest
performance vs. Inductor, as well as a histogram and a geomean
speedup over time for all the input CSV data in /tmp/csv-depot.
Results are broken down per accelerator and either inference or
training.
To generate results per tier, we just have to pass --filter-by-tier
to the above and update the title to --title="Tier 1".
* Fix syntax in experiment_runner.py (#5827)
* Add flag to forward XLA flags and allow for experiment expansion (#5828)
* Add hide-errors flag to result analyzer (#5836)
* Add readme and linting
* Fix ClusterResolver
---------
Co-authored-by: Liyang90 <liyanglu@google.com>
Co-authored-by: Manfei <41607353+ManfeiBai@users.noreply.github.com>
Co-authored-by: zpcore <zpcore@gmail.com>
Co-authored-by: Grzegorz Olechwierowicz <golechwierowicz@gmail.com>
Co-authored-by: Emilio Cota <cota@braap.org>