@regisss , can we have this fix merged quickly since it is a blocker issue affecting llama finetuning functionality on Gaudi.
@MohitIntel I tried the command that you posted on v1.14.0 and I don't see any error.
I also checked the llama70b model, and pad id is defined.
https://huggingface.co/meta-llama/Llama-2-70b-hf/blob/main/generation_config.json#L5
Where do you see this error?
@jiminha try with llama3 not llama2.
@libinta Please check this ticket if we want to bring this into also point release. Finetuning failing for Llama3.1 models due to pad_token_id missing.
LGTM!
Login to write a write a comment.
New error 'ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as
pad_token
(tokenizer.pad_token = tokenizer.eos_token e.g.)
or add a new pad token viatokenizer.add_special_tokens({'pad_token': '[PAD]'})
' introduced by #1444 for Llama3.1.This patch fixes the above when running the run_lora_clm.py for finetuning llama3.1 using command from language-modeling README: