transformers
Allow repo_id--module.classname config definition even if loading from path
#29083
Closed

Allow repo_id--module.classname config definition even if loading from path #29083

rl337
rl3371 year ago (edited 1 year ago)

When you have a model that's in a path that's not exactly the repo_id relative to the current directory and the config has AutoConfig of the form model_id--module.classname in it, you can't load the model using the path to the model because resolving module.classname ends up being relative to repo_id as defined in the config.

Let me lay this out.

path/to/large/storage/
    models/
        model_a/
            config.json
            tokenizer_config.json
            model_config.py
            model_impl.py
        model_b/
             ...
path/to/sourcecode/
        my_module/
            __init__.py
            __main__.py

What i want to do is, from my __main__.py load my model from AutoModel.from_pretrained() so I pass path/to/large/storage/models/model_a as model_id_or_path with local_files_only=True because i only want to use the specific model that i have on my filesystem.

When you try to do this, you end up with an exception that looks like this:

Traceback (most recent call last):
  File "/Users/rlee/dev/singularity/.venv/lib/python3.9/site-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/Users/rlee/dev/singularity/.venv/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/Users/rlee/dev/singularity/.venv/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1363, in hf_hub_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

Here, even though I'm trying to load the model from a path it's trying to resolve the code relative to the repo_id. This is problematic because if i made changes to the model code and i'm not restricting downloads, i may download and use out of date code from the hub. If i don't notice that it's downloaded code, it'd be super confusing to debug.

The workaround is to edit the config.json to remove the repo_id-- part of the definition which is kind of annoying because if you want to push changes to the hub after, you need to remember to add the repo_id-- back in.

I think the root problem here is when the model_id_or_path is specified as a path, really what it's doing is its acting like a path to the config.json and then it doesn't treat the directory it loads the config.json from to be a self contained model. It instead tries to resolve things defined in the config.json relative to the current directory or relative to the model hub/cache. Concretely it requires a directory structure that looks more like this:

path/to/sourcecode/
    my_module/
        __init__.py
        __main__.py
    username/model_id/
        config.json
        tokenizer_config.json
        model_config.py
        model_impl.py

When i run my __main__.py from the directory designated by path/to/sourcecode, everything seems to work okay because resolution of the model_id username/model_id happens relative to the current directory.

What does this PR do?

This PR adds a check to see if the repo_or_path is a path containing the module file to download. If it is, load from that path instead of referencing the repo_id. It then tries to load the model.classname from the path rather than the repo_id when dynamically loading classes.

At this point, the config.json was already loaded from the path so likely the path is okay to load code from especially if trust_remote_code is true.. which it must be at this point.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

amyeroberts
amyeroberts1 year ago
rl337 rl337 force pushed 1 year ago
rl337
rl3371 year ago

@Rocketknight1 So this PR should fix the issue i referenced in my last PR about subfolder params not working right. I hadn't understood that subfolder meant at the time. I'll verify that i can remove the chdir in my previous test case with this change.

rl337
rl3371 year ago

@Rocketknight1 @amyeroberts Okay I updated the test from my previous PR to directly load from the arbitrary path instead of relying on the current directory and it seems to work now.

I think that this change is ready for your consideration.

rl337 rl337 force pushed to 125d4d78 1 year ago
rl337 Allow the config definition of repo_id--module.classname even when lo…
d0eb8ef2
rl337 use the path to the directory rather than the pretrained param passed in
dabcbdc5
rl337 don't chdir in this test because loading the config from arbitrary pa…
125d4d78
Rocketknight1
Rocketknight11 year ago

Taking a look today/Monday!

Rocketknight1
Rocketknight11 year ago (edited 1 year ago)

Hi @rl337, I'm investigating this now. It seems clean, but can you give me an example of a repo where the auto_map parameter has the form model_id--module.classname, just so I can experiment with this? Not all repos with custom code use that structure, so I'd like to check with other people at Hugging Face what exactly the intended formatting and behaviour is for those fields.

rl337
rl3371 year ago

@Rocketknight1 sure. The first place that I encountered this was https://huggingface.co/togethercomputer/StripedHyena-Nous-7B

rl337
rl3371 year ago

@Rocketknight1 One thing that my fix doesn't address is cross model dependencies. It's something that i was considering trying out but would definitely cause issues with the current code.

Consider for a moment you have a "library model" which doesn't actually do anything but has the config and implementations of many models and tokenizers and then you refer to this model from other models. Consider the directory structure:

/path/to/my_id
                  /library_model
                     base_model.py
                  /model_a
                     config.json
                  /model_b
                    config.json

both model_a and model_b define their AutoModel map as: my_id/library_model--base_model.SomeCommonModel

Is that something we want to support here? If that's the case, i'd change the fix here to have a model_storage_dir instead of a model_id_or_path and use that to join against model_id/model instead of using it as the path to the files themselves.

I could imagine this flexibility to be super useful.

Rocketknight1
Rocketknight11 year ago (edited 1 year ago)

Hi @rl337, firstly sorry for taking so long to try a reproduction here, but I'm struggling to figure out the issue with StripedHyena-7B, or possibly I'm misunderstanding the problem. I've tried the following:

  • Downloading StripedHyena-7B with AutoModelForCausalLM.from_pretrained("togethercomputer/StripedHyena-Nous-7B")
  • Accessing it from the transformers cache with AutoModelForCausalLM.from_pretrained("togethercomputer/StripedHyena-Nous-7B", local_files_only=True)
  • Cloning the git repo to a local folder and then accessing it with e.g. AutoModelForCausalLM.from_pretrained("/path/to/local/dir", local_files_only=True)
  • Accessing the cloned repo from other locations with different relative paths.

In all of these cases, it seems to work fine. Can you help me out with some specific steps to reproduce the issue so I can dig into what's going on here?

Rocketknight1 Rocketknight1 assigned Rocketknight1 Rocketknight1 1 year ago
rl337
rl3371 year ago (edited 1 year ago)

okay here is the code block that fails for me with that model:

from transformers import AutoConfig, AutoTokenizer, AutoModel

model_path = '/mnt/model_storage/togethercomputer/StripedHyena-Nous-7B'
conf = AutoConfig.from_pretrained(model_path, local_files_only=True, trust_remote_code=True)
model = AutoModel.from_config(conf)

i'm running this from a subdir of my home directory and not anywhere in the path to the model.

The failure looks like this:

rlee@amalgam:~/dev/transformers$ PYTHONPATH=./src ./venv/bin/python test.py 
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Could not locate the configuration_hyena.py inside togethercomputer/StripedHyena-Nous-7B.
Traceback (most recent call last):
  File "/home/rlee/dev/transformers/src/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1362, in hf_hub_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

if i edit the config.json to remove the togethercomputer/StripedHyena-Nous-7B-- part of the AutoConfig entries the code works.

if run the code from /mnt/model_storage where it coincidentally makes together/StripedHyena-Nous-7B a correct path relative to current directory, the code works.

Rocketknight1
Rocketknight11 year ago (edited 1 year ago)

Hi @rl337, the exact same code works for me! I had to replace AutoModel with AutoModelForCausalLM because AutoModel wasn't in the auto_map, but otherwise it was all fine. I think this might be some kind of environment issue. Can you try:

  1. pip install --upgrade huggingface_hub
  2. pip install --upgrade git+https://github.com/huggingface/transformers.git

Also, in the log you posted I see None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used. Is it possible that the issue is that PyTorch classes like AutoModel are just failing to initialize because torch isn't present?

rl337
rl3371 year ago

Yeah i didn't have PyTorch in that virtual environment but it gives me the same failure if PyTorch is installed.

Here is the run after installing the CPU only PyTorch so that we don't get that warning.

rlee@amalgam:~/dev/transformers$ PYTHONPATH=$HOME/dev/transformers/src $HOME/dev/transformers/venv/bin/python $HOME/dev/transformers/test.py 
Could not locate the configuration_hyena.py inside togethercomputer/StripedHyena-Nous-7B.
Traceback (most recent call last):
  File "/home/rlee/dev/transformers/src/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1362, in hf_hub_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

I was using the transformers in place so adding transformers/src to PYTHONPATH and also the checkout was upstream/main updated this morning so it's latest code. For the sake of consistency though i did the installs in the virtual environment as you suggested:

rlee@amalgam:~/dev/transformers$ ./venv/bin/pip install --upgrade huggingface_hub
Requirement already satisfied: huggingface_hub in ./venv/lib/python3.10/site-packages (0.20.3)
Collecting huggingface_hub
  Downloading huggingface_hub-0.21.3-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: ...
Installing collected packages: huggingface_hub
  Attempting uninstall: huggingface_hub
    Found existing installation: huggingface-hub 0.20.3
    Uninstalling huggingface-hub-0.20.3:
      Successfully uninstalled huggingface-hub-0.20.3
Successfully installed huggingface_hub-0.21.3

and transformers

rlee@amalgam:~/dev/transformers$ ./venv/bin/pip install --upgrade git+https://github.com/huggingface/transformers.git
Collecting git+https://github.com/huggingface/transformers.git
  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-w2n2ou3s
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-w2n2ou3s
  Resolved https://github.com/huggingface/transformers.git to commit 0ad770c3733f9478a8d9d0bc18cc6143877b47a2
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: ...
Building wheels for collected packages: transformers
  Building wheel for transformers (pyproject.toml) ... done
  Created wheel for transformers: filename=transformers-4.39.0.dev0-py3-none-any.whl size=8593793 sha256=382de428f4f8fb87f4a918d7a957a800648d9da9fafefcc9cfbf55aad64d1ebd
  Stored in directory: /tmp/pip-ephem-wheel-cache-8m3urbja/wheels/e7/9c/5b/e1a9c8007c343041e61cc484433d512ea9274272e3fcbe7c16
Successfully built transformers
Installing collected packages: tokenizers, transformers
Successfully installed tokenizers-0.15.2 transformers-4.39.0.dev0

doing this means i can remove the PYTHONPATH since transformers are now in the virtual environment.

Here's the re-run of the same code.

rlee@amalgam:~/dev/transformers$ $HOME/dev/transformers/venv/bin/python $HOME/dev/transformers/test.py 
Could not locate the configuration_hyena.py inside togethercomputer/StripedHyena-Nous-7B.
Traceback (most recent call last):
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1397, in hf_hub_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

if i cd to /mnt/model_storage where the path becomes valid again... here is what the output looks like:

rlee@amalgam:/mnt/model_storage$ $HOME/dev/transformers/venv/bin/python $HOME/dev/transformers/test.py 
Traceback (most recent call last):
  File "/home/rlee/dev/transformers/test.py", line 5, in <module>
    model = AutoModel.from_config(conf)
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 439, in from_config
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers_modules.StripedHyena-Nous-7B.configuration_hyena.StripedHyenaConfig'> for this kind of AutoModel: AutoModel.

which sounds like what you hit but this error is moot because we got past the loading of the model config object which is what this patch is about. If i change the AutoModel to AutoModelForCausalLM as you did, here is the output:

rlee@amalgam:/mnt/model_storage$ $HOME/dev/transformers/venv/bin/python $HOME/dev/transformers/test.py 
The repository for /mnt/model_storage/togethercomputer/StripedHyena-Nous-7B contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//mnt/model_storage/togethercomputer/StripedHyena-Nous-7B.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y

so we hit the issue that we merged from my previous PR.

If we go back to the directory relative to my hom directory... we fail the way i noted when filing the bug.

rlee@amalgam:/mnt/model_storage$ cd ~/dev/transformers/
rlee@amalgam:~/dev/transformers$ $HOME/dev/transformers/venv/bin/python $HOME/dev/transformers/test.py 
Could not locate the configuration_hyena.py inside togethercomputer/StripedHyena-Nous-7B.
Traceback (most recent call last):
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 398, in cached_file
    resolved_file = hf_hub_download(
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/rlee/dev/transformers/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1397, in hf_hub_download
    raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Cannot find the requested files in the disk cache and outgoing traffic has been disabled. To enable hf.co look-ups and downloads online, set 'local_files_only' to False.

It's very reproducible for me from a fresh virtual environment. Have you tried from a fresh virtual env?

Rocketknight1
Rocketknight11 year ago

I just tried with a fresh conda install and it seemed to work fine - I got an import error from inside the modelling file:

ImportError: For `use_flash_rmsnorm`: `pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/layer_norm

This clearly indicates that the modelling code was found and executed. I can't seem to reproduce this bug no matter where I put the repo directory, or where I call it from!

rl337
rl3371 year ago

@Rocketknight1 yeah that's just because that model has a million crazy dependencies. You got past the failure that i'm running into.

I am starting to wonder if it's a python version issue. What version of python and conda do you use? i've been using 3.9.6 on the mac and on the linux box, it's 3.10.12. they are pretty vanilla installs because i always either containerize or use virtual environments. Neither of them are part of a conda package.

Rocketknight1
Rocketknight11 year ago

I was using Python 3.11, conda 23.10.0.

rl337
rl3371 year ago

Still working on this. I just haven't had time lately.

github-actions
github-actions1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Rocketknight1
Rocketknight11 year ago

No stale please, bot! This is still a live issue

github-actions github-actions closed this 344 days ago
huggingface huggingface deleted a comment from github-actions on 2024-05-28
amyeroberts amyeroberts reopened this 342 days ago
HuggingFaceDocBuilderDev
HuggingFaceDocBuilderDev342 days ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

huggingface huggingface deleted a comment from github-actions on 2024-06-23
huggingface huggingface deleted a comment from github-actions on 2024-07-18
github-actions
github-actions266 days ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

amyeroberts
amyeroberts266 days ago

Gentle ping @Rocketknight1

Rocketknight1
Rocketknight1266 days ago (edited 266 days ago)👍 1

I think we can close this - I wasn't able to reproduce the issue, and it seems environment-specific! It can be reopened later if we can get a reproducible error.

Rocketknight1 Rocketknight1 closed this 266 days ago

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
Labels
Milestone