llama.cpp
convert : update phi-2 to latest HF repo
#4903
Merged

convert : update phi-2 to latest HF repo #4903

ggerganov merged 2 commits into master from gg/update-phi2-convert
ggerganov
ggerganov1 year ago

fix #4898

ggerganov convert : update phi-2 to latest HF repo
fe252237
ggerganov ggerganov added need feedback
ggerganov
ggerganov commented on 2024-01-12
Conversation is marked as resolved
Show resolved
convert-hf-to-gguf.py
257267 toktypes.append(gguf.TokenType.USER_DEFINED)
258268 elif reverse_vocab[i] in added_vocab:
259269 tokens.append(reverse_vocab[i])
260 if tokenizer.added_tokens_decoder[i].special:
261 toktypes.append(gguf.TokenType.CONTROL)
262 else:
263 toktypes.append(gguf.TokenType.USER_DEFINED)
270
# check if tokenizer has added_tokens_decoder
271
if hasattr(tokenizer, "added_tokens_decoder"):
272
if tokenizer.added_tokens_decoder[i].special:
273
toktypes.append(gguf.TokenType.CONTROL)
274
else:
275
toktypes.append(gguf.TokenType.USER_DEFINED)
ggerganov1 year ago

Not sure about this change - without the hasattr check, the phi-2 model fails to convert. Please advise

cebtenzzre1 year ago (edited 1 year ago)

This could be because of trust_remote_code=True, it's probably providing a tokenizer based on an older version of HF transformers/tokenizers (assuming your transformers and tokenizers are up-to-date). We should probably only enable trust_remote_code for models that require it.

ebeyabraham1 year ago (edited 1 year ago)👍 2

the conversion script works without these changes for transformers>=4.34.0

Edit: added_tokens_decoder attribute was exposed in 4.34: https://github.com/huggingface/transformers/blob/29a2b1420633d322140062d7c76b807f41fb90aa/src/transformers/tokenization_utils_base.py#L1649

bearn01d1 year ago

In general, it appears that the checkpoint is already updated on HF, but the respective code is not yet in an official transformers release, so one needs to build main from source.

ggerganov1 year ago (edited 1 year ago)👍 2

On macOS when I run pip3 install -r requirements.txt I get the following:

Installing collected packages: numpy, torch, tokenizers, transformers
  Attempting uninstall: numpy
    Found existing installation: numpy 1.24.3
    Uninstalling numpy-1.24.3:
      Successfully uninstalled numpy-1.24.3
  Attempting uninstall: torch
    Found existing installation: torch 2.0.1
    Uninstalling torch-2.0.1:
      Successfully uninstalled torch-2.0.1
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.13.3
    Uninstalling tokenizers-0.13.3:
      Successfully uninstalled tokenizers-0.13.3
  Attempting uninstall: transformers
    Found existing installation: transformers 4.29.2
    Uninstalling transformers-4.29.2:
      Successfully uninstalled transformers-4.29.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
fairseq2n 0.2.0 requires torch==2.0.1, but you have torch 2.1.2 which is incompatible.
coremltools 7.0b2 requires protobuf<=4.0.0,>=3.1.0, but you have protobuf 4.25.2 which is incompatible.
tensorflow-macos 2.13.0 requires numpy<=1.24.3,>=1.22, but you have numpy 1.24.4 which is incompatible.
torchvision 0.15.2 requires torch==2.0.1, but you have torch 2.1.2 which is incompatible.
torchaudio 2.0.2 requires torch==2.0.1, but you have torch 2.1.2 which is incompatible.
ane-transformers 0.1.1 requires protobuf<=3.20.1,>=3.1.0, but you have protobuf 4.25.2 which is incompatible.
Successfully installed numpy-1.24.4 tokenizers-0.15.0 torch-2.1.2 transformers-4.36.2

Not sure about the errors, but at the end it seems that I have transformers-4.36.2 installed. But I still get the error without this change.

Should I keep the change or should I try to figure out what is wrong with my Python packages?

Edit: for now I will merge the change as I am not familiar how to deal with the "trust_remote_code" and requirements stuff. If there is a better way to handle this - please submit a PR

cebtenzzre1 year ago (edited 1 year ago)👍 1

This workaround is actually incorrect because it causes toktypes to get out of sync with tokens. We need to append something if we're going to use toktypes at all.

I have transformers 4.35.2 according to python3 -c 'import transformers; print(transformers.__version__)' and did not need this hasattr check in order to convert https://huggingface.co/microsoft/phi-2.

ggerganov1 year ago👍 2

Reverted the change (5c99960) - I now have a compatible version:

python3 -c 'import transformers; print(transformers.__version__)'
4.36.2
ggerganov py : try to fix flake stuff
1fb563eb
ggerganov ggerganov force pushed from c3d64a0f to 1fb563eb 1 year ago
ggerganov ggerganov merged 15ebe592 into master 1 year ago

Login to write a write a comment.

Login via GitHub

Assignees
No one assigned
Labels
Milestone