huggingface_hub
Default to downloading model from local files only when in offline mode
#2236
Closed

Default to downloading model from local files only when in offline mode #2236

debanjum wants to merge 1 commit into huggingface:main from debanjum:main
debanjum
debanjum1 year ago

This change speeds up loading models in offline mode by choosing smarter defaults.

Issue

Previously every call to download model would wait for request to
timeout before attempting to load model from disk.

Fix

Given we already track when we're running in offline mode via the
HF_HUB_OFFLINE constant.

We can quickly jump to loading model from disk when offline instead
of waiting for the HF request timeouts on every call to hf_hub_download

Result

See significant speed-up in loading SentenceTransformer models
when not connected to the internet (and the required model is on disk)

debanjum Default to download model from local files only when in offline mode
ab0a37b6
Wauplin
Wauplin1 year ago

Hi @debanjum, have you tested this yourself and confirm a change in behavior with this PR? If yes, do you have a reproducible example script to showcase it?

I'm asking because if HF_HUB_OFFLINE=1 then the current implementation should instantly defaults to local files without waiting for a timeout. The logic happens here (first HEAD call made) that results in a OfflineModeIsEnabled caught here. This error is raised when HF_HUB_OFFLINE=1 constant is set (see here), without sending any request to the network. So if you see a speed improvement with this PR, then it means we have a bug in our logic that I would prefer to fix.

(regarding the PR changes themselves, I don't want to accept them because we want to raise a different error message depending if local_files_only=True is passed or HF_HUB_OFFLINE=1 is set)

debanjum
debanjum1 year ago (edited 1 year ago)

Hey @Wauplin, thanks for the quick response time.

I've shared a reproducible script and results from my testing shared below

  • Using python 3.12, sentence-transformer == 2.7.0, huggingface_hub == 0.22.2, requests == 2.31.0
  • Code snippet
    from sentence_transformers import SentenceTransformer
    
    # Load any SentenceTransformer embedding model
    start_time = time.time()
    model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
    end_time = time.time()
    print(f"Model Load Time: {end_time - start_time}s")
  • Result
    Cached Model Load Load Time
    With Internet 2.68s
    Without Internet 121.10s

Analysis

I was investigating why it takes such a long time to load cached sentence transformer models. Realized it stalls at call to get_hf_file_metadata in the hf_hub_download function. This call eventually fails with an exception but it takes a long time. This is what seems to result in the delayed load time for cached models in offline mode

(regarding the PR changes themselves, I don't want to accept them because we want to raise a different error message depending if local_files_only=True is passed or HF_HUB_OFFLINE=1 is set)

I'm fine however this gets fixed, either via this PR or finding the bug that's causing the timeout delay. But I don't intend this change to change the error message thrown if HF_HUB_OFFLINE=1 and local_files_only=True is passed either. I'm guessing this can be resolved if the current code changes aren't doing it already?

Wauplin
Wauplin1 year ago

Oh I see. So in your example you don't explicitly tell your script that internet is disabled? If that's the case then it's normal that it tries to reach the network. If you run your script without internet, you must set the HF_HUB_OFFLINE environment to 1. See docs about it.

For example:

HF_HUB_OFFLINE=1 python script.py
debanjum
debanjum1 year ago

I see, thanks for the quick clarifications! Testing these changes more carefully, I see this doesn't help mitigate the problem as I'd originally thought.

I'll update the Khoj code base to automatically set HF_HUB_OFFLINE=1 when internet isn't enabled for faster load. Given that the SentenceTransformer library doesn't allow passing local_files_only=True down to the huggingface_hub.

Closing this PR for now

debanjum debanjum closed this 1 year ago
Wauplin
Wauplin1 year ago

Thanks for the context @debanjum. The solution you've described seems reasonable to me since Khoj seems to be explicitly meant for offline mode -at least in some cases-.

(still ping @tomaarsen about SentenceTransformers for visibility)

debanjum
debanjum1 year ago👍 1

Thanks for the recommendation! I've done that already on a PR I raised on the SentenceTransformer library to help pass local_files_only through to hf_hub for loading cached models faster when offline

Login to write a write a comment.

Login via GitHub

Reviewers
No reviews
Assignees
No one assigned
Labels
Milestone