huggingface_hub
Default to downloading model from local files only when in offline mode
#2236

Closed

Default to downloading model from local files only when in offline mode #2236

debanjum wants to merge 1 commit into huggingface:main from debanjum:main

debanjum1 year ago

This change speeds up loading models in offline mode by choosing smarter defaults.

Issue

Previously every call to download model would wait for request to
timeout before attempting to load model from disk.

Fix

Given we already track when we're running in offline mode via the
HF_HUB_OFFLINE constant.

We can quickly jump to loading model from disk when offline instead
of waiting for the HF request timeouts on every call to hf_hub_download

Result

See significant speed-up in loading SentenceTransformer models
when not connected to the internet (and the required model is on disk)

Default to download model from local files only when in offline mode

ab0a37b6

Wauplin1 year ago

Hi @debanjum, have you tested this yourself and confirm a change in behavior with this PR? If yes, do you have a reproducible example script to showcase it?

I'm asking because if HF_HUB_OFFLINE=1 then the current implementation should instantly defaults to local files without waiting for a timeout. The logic happens here (first HEAD call made) that results in a OfflineModeIsEnabled caught here. This error is raised when HF_HUB_OFFLINE=1 constant is set (see here), without sending any request to the network. So if you see a speed improvement with this PR, then it means we have a bug in our logic that I would prefer to fix.

(regarding the PR changes themselves, I don't want to accept them because we want to raise a different error message depending if local_files_only=True is passed or HF_HUB_OFFLINE=1 is set)

debanjum1 year ago (edited 1 year ago)

Hey @Wauplin, thanks for the quick response time.

I've shared a reproducible script and results from my testing shared below

Using python 3.12, sentence-transformer == 2.7.0, huggingface_hub == 0.22.2, requests == 2.31.0

Code snippet

from sentence_transformers import SentenceTransformer

# Load any SentenceTransformer embedding model
start_time = time.time()
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
end_time = time.time()
print(f"Model Load Time: {end_time - start_time}s")

Result

Cached Model Load Load Time

With Internet 2.68s

Without Internet 121.10s

Cached Model Load	Load Time
With Internet	2.68s
Without Internet	121.10s

Analysis

I was investigating why it takes such a long time to load cached sentence transformer models. Realized it stalls at call to get_hf_file_metadata in the hf_hub_download function. This call eventually fails with an exception but it takes a long time. This is what seems to result in the delayed load time for cached models in offline mode

(regarding the PR changes themselves, I don't want to accept them because we want to raise a different error message depending if local_files_only=True is passed or HF_HUB_OFFLINE=1 is set)

I'm fine however this gets fixed, either via this PR or finding the bug that's causing the timeout delay. But I don't intend this change to change the error message thrown if HF_HUB_OFFLINE=1 and local_files_only=True is passed either. I'm guessing this can be resolved if the current code changes aren't doing it already?

Wauplin1 year ago

Oh I see. So in your example you don't explicitly tell your script that internet is disabled? If that's the case then it's normal that it tries to reach the network. If you run your script without internet, you must set the HF_HUB_OFFLINE environment to 1. See docs about it.

For example:

HF_HUB_OFFLINE=1 python script.py

debanjum1 year ago

I see, thanks for the quick clarifications! Testing these changes more carefully, I see this doesn't help mitigate the problem as I'd originally thought.

I'll update the Khoj code base to automatically set HF_HUB_OFFLINE=1 when internet isn't enabled for faster load. Given that the SentenceTransformer library doesn't allow passing local_files_only=True down to the huggingface_hub.

Closing this PR for now

debanjum closed this 1 year ago

Wauplin1 year ago

Thanks for the context @debanjum. The solution you've described seems reasonable to me since Khoj seems to be explicitly meant for offline mode -at least in some cases-.

(still ping @tomaarsen about SentenceTransformers for visibility)

debanjum1 year ago👍 1

Thanks for the recommendation! I've done that already on a PR I raised on the SentenceTransformer library to help pass local_files_only through to hf_hub for loading cached models faster when offline

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Milestone

No milestone

huggingface_hub Default to downloading model from local files only when in offline mode #2236 Closed

Default to downloading model from local files only when in offline mode #2236

Issue

Fix

Result

Analysis

huggingface_hub
Default to downloading model from local files only when in offline mode
#2236

Closed