fix bug for record collections (#1366)
Summary:
When using model analyzer to collect GPU and CPU metrics, the monitor threads are not safely exiting, and the last bunch of records is discarded. It has a small impact on the final metric values because we aggregate all values and use the average number. But when we enable the `--export-metrics` option to export all detailed metric information, the discarded records have a big influence on the result correctness.
This PR fixes the bug via the following method.
- use `self._thread.wait()` for the monitor threads to finish.
- add an extra `self._monitoring_iteration()` to collect the last bunch of records after setting `self._thread_active` to false.
Pull Request resolved: https://github.com/pytorch/benchmark/pull/1366
Reviewed By: erichan1
Differential Revision: D42464909
Pulled By: xuzhao9
fbshipit-source-id: 1f0bbe798d4fdd6271013694e499be5e9d40a252