Track mean relative error across models. (#6250)
This PR introduces, and integrates a verification module under the --verify flag. The verification module gets model output as an input, model/benchmark reconstruction args, then runs the eager and finally calculates the relative error and compares it against the provided threshold to return the appropriate error code.