Hi @debanjum, have you tested this yourself and confirm a change in behavior with this PR? If yes, do you have a reproducible example script to showcase it?
I'm asking because if HF_HUB_OFFLINE=1
then the current implementation should instantly defaults to local files without waiting for a timeout. The logic happens here (first HEAD call made) that results in a OfflineModeIsEnabled
caught here. This error is raised when HF_HUB_OFFLINE=1
constant is set (see here), without sending any request to the network. So if you see a speed improvement with this PR, then it means we have a bug in our logic that I would prefer to fix.
(regarding the PR changes themselves, I don't want to accept them because we want to raise a different error message depending if local_files_only=True
is passed or HF_HUB_OFFLINE=1
is set)
Hey @Wauplin, thanks for the quick response time.
I've shared a reproducible script and results from my testing shared below
python 3.12
, sentence-transformer == 2.7.0
, huggingface_hub == 0.22.2
, requests == 2.31.0
from sentence_transformers import SentenceTransformer
# Load any SentenceTransformer embedding model
start_time = time.time()
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
end_time = time.time()
print(f"Model Load Time: {end_time - start_time}s")
Cached Model Load | Load Time |
---|---|
With Internet | 2.68s |
Without Internet | 121.10s |
I was investigating why it takes such a long time to load cached sentence transformer models. Realized it stalls at call to get_hf_file_metadata in the hf_hub_download
function. This call eventually fails with an exception but it takes a long time. This is what seems to result in the delayed load time for cached models in offline mode
(regarding the PR changes themselves, I don't want to accept them because we want to raise a different error message depending if local_files_only=True is passed or HF_HUB_OFFLINE=1 is set)
I'm fine however this gets fixed, either via this PR or finding the bug that's causing the timeout delay. But I don't intend this change to change the error message thrown if HF_HUB_OFFLINE=1
and local_files_only=True
is passed either. I'm guessing this can be resolved if the current code changes aren't doing it already?
Oh I see. So in your example you don't explicitly tell your script that internet is disabled? If that's the case then it's normal that it tries to reach the network. If you run your script without internet, you must set the HF_HUB_OFFLINE
environment to 1. See docs about it.
For example:
HF_HUB_OFFLINE=1 python script.py
I see, thanks for the quick clarifications! Testing these changes more carefully, I see this doesn't help mitigate the problem as I'd originally thought.
I'll update the Khoj code base to automatically set HF_HUB_OFFLINE=1
when internet isn't enabled for faster load. Given that the SentenceTransformer library doesn't allow passing local_files_only=True
down to the huggingface_hub
.
Closing this PR for now
Thanks for the context @debanjum. The solution you've described seems reasonable to me since Khoj seems to be explicitly meant for offline mode -at least in some cases-.
(still ping @tomaarsen about SentenceTransformers for visibility)
Thanks for the recommendation! I've done that already on a PR I raised on the SentenceTransformer library to help pass local_files_only
through to hf_hub for loading cached models faster when offline
Login to write a write a comment.
This change speeds up loading models in offline mode by choosing smarter defaults.
Issue
Previously every call to download model would wait for request to
timeout before attempting to load model from disk.
Fix
Given we already track when we're running in offline mode via the
HF_HUB_OFFLINE
constant.We can quickly jump to loading model from disk when offline instead
of waiting for the HF request timeouts on every call to
hf_hub_download
Result
See significant speed-up in loading SentenceTransformer models
when not connected to the internet (and the required model is on disk)