PR #2345 `[refactor]` model loading - no more unnecessary file downloads

tomaarsen1 year ago👍 1

Hello!

Pull Request overview

Refactor the model loading;
- No longer download the full model repository.
- Update cache format to git style via hf_hub_download.
- No longer use deprecated cached_download.
- Soft deprecation of use_auth_token in favor of token as required by recent transformers/huggingface_hub versions.
Add test to ensure that correct/appropriate files are downloaded.

Details

In short, model downloading has moved from greedy full repository downloading to lazy per-module downloading, where no files are downloaded for Transformers modules.

Original model loading steps

Greedily load the full model repository to the cache folder.
Check if modules.json exists.
If so, load all modules individually using the local files downloaded in the last step.
If not, load Transformer using the local files downloaded in the last step + Pooling.
Done

New model loading steps

Check if modules.json exists locally or on the Hub.
If so,
a. Download the ST configuration files ('config_sentence_transformers.json', 'README.md', 'modules.json') if they're remote.
b. For each module, if it is not transformers, then download (if necessary) the directory with configuration/weights for that module. If it is transformers, then do not download & load the model using the model_name_or_path.
If not, load Transformer using the model_name_or_path + Pooling.
Done

With this changed setup, we defer downloading any transformers data to transformers itself. In a test model that I uploaded with both pytorch_model.bin and model.safetensors, only the safetensors file is loaded. This is verified in the attached test case.

Additional changes

As required by huggingface_hub, we now use token instead of use_auth_token. If use_auth_token is still provided, then token = use_auth_token is set, and a warning is given. I.e. a soft deprecation.

Tom Aarsen

Refactor model loading: no full repo download

4bf9e994

Add simple test regarding efficient loading

31646a9e

Replace use_auth_token with token in docstring

a1a1cd75

tomaarsen changed the title ~~Refactor model loading - no more unnecessary file downloads~~ `[refactor]` model loading - no more unnecessary file downloads 1 year ago

bwanglzu commented on 2023-11-14

Conversation is marked as resolved

Show resolved

bwanglzu commented on 2023-11-14

Conversation is marked as resolved

Show resolved

bwanglzu commented on 2023-11-14

Conversation is marked as resolved

Show resolved

sentence_transformers/util.py

421	421
	422	def is_sentence_transformer_model(model_name_or_path: str, token: Optional[Union[bool, str]] = None) -> bool:
	423	if os.path.exists(model_name_or_path):
	424	return os.path.exists(os.path.join(model_name_or_path, "modules.json"))

bwanglzu1 year ago

not sure if modules.json is deterministic, maybe sentence_bert_config is better? https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/blob/main/sentence_bert_config.json

tomaarsen1 year ago

I think modules.json must always be there, while sentence_bert_config.json is only usually there. For example, this code makes me believe that there may be models that don't use sentence_bert_config.json:

sentence-transformers/sentence_transformers/models/Transformer.py

Lines 137 to 141 in c5f93f7

    
           #Old classes used other config names than 'sentence_bert_config.json' 
        
           for config_name in ['sentence_bert_config.json', 'sentence_roberta_config.json', 'sentence_distilbert_config.json', 'sentence_camembert_config.json', 'sentence_albert_config.json', 'sentence_xlm-roberta_config.json', 'sentence_xlnet_config.json']: 
        
               sbert_config_path = os.path.join(input_path, config_name) 
        
               if os.path.exists(sbert_config_path): 
        
                   break

And I can find some older models that don't even use Transformers that do still have modules.json: https://huggingface.co/sentence-transformers/average_word_embeddings_levy_dependency/tree/main

I appreciate you looking into this though!

bwanglzu1 year ago

i see it make sense. To be frankly the way it current handle config is not so beautiful lol, maybe later a better design is needed :)

tomaarsen1 year ago (edited 1 year ago)👍 1

I agree completely! I've already started brainstorming some options, though I think my main idea would be infeasible due to some breaking changes here and there. It involves subclassing SentenceTransformer as PreTrainedModel rather than nn.Sequential. Then it could use more of the functionality from transformers, e.g. load_in_8bit, PEFT, etc.

The modules.json and sentence_bert_config.json would be removed in favor of placing that information inside of config_sentence_transformers.json, and the other folders with e.g. ..._Pooling, ..._Dense or ..._Normalize could be removed as well. The configuration for that would live inside of the 1 config file: config_sentence_transformers.json, and the weights (e.g. for Dense) would be stored natively by transformers via save_pretrained because the SentenceTransformer is now a special subclass of PreTrainedModel.

My primary concerns are models that don't use transformers, but those are few and far between.

I'd love your thoughts on this!

bwanglzu1 year ago

i'm not familiar as you on the transofmrers source code, let me read a bit the class:PretrainedModel and get back to you with some proper thoughts

bwanglzu commented on 2023-11-14

bwanglzu1 year ago

left some very minor comments, do you think it make sense, at some point to refactor tests in pytests? i personally find it much more effective than unittest

tomaarsen1 year ago🚀 1

I also prefer pytest. I would indeed like to fully refactor the tests and heavily improve them. The current coverage is quite low for my tastes! Thanks for the review by the way!

Tom Aarsen

Sirri691 year ago

Somebody for the love of god, please merge this and update pypi

Prevent crash if internet is down

e1ca4083

Merge branch 'master' into feat/efficient_loading

7618a4f2

Sirri691 year ago

THANK YOU

tomaarsen1 year ago😄 1

@Sirri69 I'm on it 😉 Give it a few days.

I made updates to introduce better support if Internet is unavailable. Now, we can run the following script under various settings:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.encode("This is a test sentence", normalize_embeddings=True)
print(embeddings.shape)

These are now the outputs under the various settings:

	Internet	No Internet
Cache	`(384,)`	`(384,)`
No Cache	`modules.json: 100%\|█████████████████████████████████\| 349/349 [00:00<?, ?B/s]` `config_sentence_transformers.json: 100%\|████████████\| 116/116 [00:00<?, ?B/s]` `README.md: 100%\|████████████████████████████████\| 10.6k/10.6k [00:00<?, ?B/s]` `sentence_bert_config.json: 100%\|██████████████████\| 53.0/53.0 [00:00<?, ?B/s]` `config.json: 100%\|██████████████████████████████████\| 612/612 [00:00<?, ?B/s]` `pytorch_model.bin: 100%\|████████████████\| 90.9M/90.9M [00:06<00:00, 14.9MB/s]` `tokenizer_config.json: 100%\|████████████████████████\| 350/350 [00:00<?, ?B/s]` `vocab.txt: 100%\|██████████████████████████\| 232k/232k [00:00<00:00, 1.36MB/s]` `tokenizer.json: 100%\|█████████████████████\| 466k/466k [00:00<00:00, 4.97MB/s]` `special_tokens_map.json: 100%\|██████████████\| 112/112 [00:00<00:00, 90.1kB/s]` `1_Pooling/config.json: 100%\|████████████████████████\| 190/190 [00:00<?, ?B/s]` `(384,)`	OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

This is exactly what I would hope to get.

cc: @nreimers as we discussed this.

Tom Aarsen

Use load_file_path in "is_sbert_model"

f26ba94d

Merge branch 'master' of https://github.com/UKPLab/sentence-transform…

a00482f2

Merge branch 'master' into feat/efficient_loading

033bf6d5

Merge branch 'master' into feat/efficient_loading

255e828d

tomaarsen merged 331549c0 into master 1 year ago

tomaarsen deleted the feat/efficient_loading branch 1 year ago

peiyangL289 days ago👍 1

@tomaarsen

Hi, I appreciate this update to support model loading without an internet connection.

However, I find that loading the model is very slow without an internet connection. My testing code is as follows:

import time
start = time.time()
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu')
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)

The output is as follows:

# without internet
<All keys matched successfully>
(1, 768)
time: 376.90756702423096

# with internet
<All keys matched successfully>
(1, 768)
time: 15.75501823425293

Additionally, I found that adding the local_files_only=True parameter speeds up model loading without an internet connection, but it is still quite slow.

import time
start = time.time()
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu', local_files_only=True)
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)

# output:
# <All keys matched successfully>
# (1, 768)
# time: 145.69492316246033

38	39	:param modules: This parameter can be used to create custom SentenceTransformer models from scratch.
39	40	:param device: Device (like 'cuda' / 'cpu') that should be used for computation. If None, checks if a GPU can be used.
40	41	:param cache_folder: Path to store models. Can be also set by SENTENCE_TRANSFORMERS_HOME enviroment variable.
41		:param use_auth_token: HuggingFace authentication token to download private models.
	42	:param token: HuggingFace authentication token to download private models.

45	46	device: Optional[str] = None,
46	47	cache_folder: Optional[str] = None,
47		use_auth_token: Union[bool, str, None] = None
	48	token: Optional[Union[bool, str]] = None,
	49	use_auth_token: Optional[Union[bool, str]] = None,

sentence-transformers
`[refactor]` model loading - no more unnecessary file downloads
#2345

Merged

`[refactor]` model loading - no more unnecessary file downloads #2345

Pull Request overview

Details

Original model loading steps

New model loading steps

Additional changes

	#Old classes used other config names than 'sentence_bert_config.json'
	for config_name in ['sentence_bert_config.json', 'sentence_roberta_config.json', 'sentence_distilbert_config.json', 'sentence_camembert_config.json', 'sentence_albert_config.json', 'sentence_xlm-roberta_config.json', 'sentence_xlnet_config.json']:
	sbert_config_path = os.path.join(input_path, config_name)
	if os.path.exists(sbert_config_path):
	break

sentence-transformers `[refactor]` model loading - no more unnecessary file downloads #2345 Merged

`[refactor]` model loading - no more unnecessary file downloads #2345

Pull Request overview

Details

Original model loading steps

New model loading steps

Additional changes

sentence-transformers
`[refactor]` model loading - no more unnecessary file downloads
#2345

Merged