left some very minor comments, do you think it make sense, at some point to refactor tests in pytests? i personally find it much more effective than unittest
I also prefer pytest
. I would indeed like to fully refactor the tests and heavily improve them. The current coverage is quite low for my tastes! Thanks for the review by the way!
Somebody for the love of god, please merge this and update pypi
THANK YOU
@Sirri69 I'm on it π Give it a few days.
I made updates to introduce better support if Internet is unavailable. Now, we can run the following script under various settings:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.encode("This is a test sentence", normalize_embeddings=True)
print(embeddings.shape)
These are now the outputs under the various settings:
Internet | No Internet | |
---|---|---|
Cache | (384,) |
(384,) |
No Cache | modules.json: 100%|βββββββββββββββββββββββββββββββββ| 349/349 [00:00<?, ?B/s] config_sentence_transformers.json: 100%|ββββββββββββ| 116/116 [00:00<?, ?B/s] README.md: 100%|ββββββββββββββββββββββββββββββββ| 10.6k/10.6k [00:00<?, ?B/s] sentence_bert_config.json: 100%|ββββββββββββββββββ| 53.0/53.0 [00:00<?, ?B/s] config.json: 100%|ββββββββββββββββββββββββββββββββββ| 612/612 [00:00<?, ?B/s] pytorch_model.bin: 100%|ββββββββββββββββ| 90.9M/90.9M [00:06<00:00, 14.9MB/s] tokenizer_config.json: 100%|ββββββββββββββββββββββββ| 350/350 [00:00<?, ?B/s] vocab.txt: 100%|ββββββββββββββββββββββββββ| 232k/232k [00:00<00:00, 1.36MB/s] tokenizer.json: 100%|βββββββββββββββββββββ| 466k/466k [00:00<00:00, 4.97MB/s] special_tokens_map.json: 100%|ββββββββββββββ| 112/112 [00:00<00:00, 90.1kB/s] 1_Pooling/config.json: 100%|ββββββββββββββββββββββββ| 190/190 [00:00<?, ?B/s] (384,) |
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'. |
This is exactly what I would hope to get.
cc: @nreimers as we discussed this.
Hi, I appreciate this update to support model loading without an internet connection.
However, I find that loading the model is very slow without an internet connection. My testing code is as follows:
import time
start = time.time()
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu')
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)
The output is as follows:
# without internet
<All keys matched successfully>
(1, 768)
time: 376.90756702423096
# with internet
<All keys matched successfully>
(1, 768)
time: 15.75501823425293
Additionally, I found that adding the local_files_only=True
parameter speeds up model loading without an internet connection, but it is still quite slow.
import time
start = time.time()
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu', local_files_only=True)
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)
# output:
# <All keys matched successfully>
# (1, 768)
# time: 145.69492316246033
Login to write a write a comment.
Hello!
Pull Request overview
hf_hub_download
.cached_download
.use_auth_token
in favor oftoken
as required by recenttransformers
/huggingface_hub
versions.Details
In short, model downloading has moved from greedy full repository downloading to lazy per-module downloading, where no files are downloaded for
Transformers
modules.Original model loading steps
modules.json
exists.Transformer
using the local files downloaded in the last step +Pooling
.New model loading steps
modules.json
exists locally or on the Hub.a. Download the ST configuration files (
'config_sentence_transformers.json'
,'README.md'
,'modules.json'
) if they're remote.b. For each module, if it is not transformers, then download (if necessary) the directory with configuration/weights for that module. If it is transformers, then do not download & load the model using the
model_name_or_path
.Transformer
using themodel_name_or_path
+Pooling
.With this changed setup, we defer downloading any
transformers
data totransformers
itself. In a test model that I uploaded with bothpytorch_model.bin
andmodel.safetensors
, only the safetensors file is loaded. This is verified in the attached test case.Additional changes
As required by
huggingface_hub
, we now usetoken
instead ofuse_auth_token
. Ifuse_auth_token
is still provided, thentoken = use_auth_token
is set, and a warning is given. I.e. a soft deprecation.