Eval harness (#212)
* Add functionality for running the evaluation harness on single gpu
* Add support for pipelining
* support tensor parallel
* save the results
* Minor cleanup
* Experimental Deepspeed support
* Proper deepspeed integration, now working on combined tp and pp
* Update model loading and clean up code.
* Add some options
* Fix pipelining + fp32 evaluaiton.
* Remove dummy paths in examples/run_evalharness.sh
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
* Simplify offline loading with export HF_DATASETS_OFFLINE=1
* Remove accidental copy-paste.
* Experimantel deepspeed evaluation-path
* make it work with deepspeed; add instructions
* improve
* make adaptive_seq_len work with deepspeed
* move to slurm
* fixes
* cleanup
* add instructions on how to import data into the spreadsheet
* not tracking ppl/em
* add task version
* make compatible with lm-eval@master
* switch to 16gb slurm; simplify; improve instructions
* Deepspeed model loading hack
* Restore correct zero state.
* fix conversion script
* simpler config
* corrections
* add logiqa
* dealing with custom tokenizers
* fix
* Update examples/run_evalharness_deepspeed.md
* check that the checkpoint path is valid
* skip --abort_on_unmet_fused_kernel_constraints during eval
* disable sanity check on layers-2%pp==0
* sort skip_keys
* make the default path unique to avoid overwrite
* Add bootstrap_iters arg
* Explain bootstrap_iters flag
* Intermediate results flag
* Add backup file
* Add arg to reduce bubble for pipeline parallel
* Fix adaptive_seq_len via resetting activation shape
* Extract args.load prior to load_ds_checkpoint_and_setup_megatron
* Parse args prior to loading function to get load_path
* Add run_evalharness-tr11-176b-ml slurm script
Co-authored-by: Daniel Hesslow <daniel@lighton.ai>
Co-authored-by: Thomas Wang <24695242+thomasw21@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Muennighoff <n.muennighoff@gmail.com>