xla
86636ba7 - benchmark/experiment_runner: save accelerator_model on error (#6218)

Commit

2 years ago

benchmark/experiment_runner: save accelerator_model on error (#6218) This commit can be seen as a partial revert of "63455e0cd Unify the way in which result files are dumped (#6162)". In that commit we missed that `experiment_cfg` does not have the `accelerator_model` record. Thus, when a benchmark fails we do not include that record in the JSONL file, and therefore resuming a run doesn't work because the failing entry is not recognized (note that when checking whether to resume we compare the JSONL entry against `benchmark_experiment`, which does have `accelerator_model`). We could fix this two ways: (1) always save `benchmark_experiment`, not only on success, or (2) add `accelerator_model` to experiment_config. I've chosen to go with (1) since that's what we were doing before 63455e0cd.

References

#6218 - benchmark/experiment_runner: save accelerator_model on error

Author

cota

Parents

dc8948d9

xla 86636ba7 - benchmark/experiment_runner: save accelerator_model on error (#6218)

xla
86636ba7 - benchmark/experiment_runner: save accelerator_model on error (#6218)