Internalise the NomicBERT model (#43067)
* Created nomicBert skeleton
* Implement modular_nomic_bert
Co-authored-by: Felix Arkle <felixarkle@icloud.com>
* Implement convert_nomic_bert_to_hf
* Complete nomic_bert conversion script key mappings and fix NomicBertSelfAttention signature
* Create nomic bert documentation
Implemented descriptions for the main nomic bert documentation and
debugged modular_nomic_bert
* Remove redundancies and improve documentation for nomic_bert
Co-authored-by: Felix Arkle <felixarkle@icloud.com>
* Update dependencies for nomic_bert
Add einops to setup and add availibility checks for more graceful exit
if not available
* Fix nomic_bert attention mechanism
previous version overrote bert, leading to forward_unimplemented
* Correct past_key_value to past_key_values in NomicBertSelfAttention
* Implement cache_position into NomicBertSelfAttention
* Implement transpose_for_scores in NomicBertSelfAttention
* Fix nomicBertSelfAttention
Remove code which broke the encoder only assumption
* Add kwargs to NomicBertSelfAttention and ignore non-encoder logic
* Add past_key_values to NomicBertSelfAttention output
* Alter head dimension logic for NomicBertSelfAttention
Alter logic so smaller hidden dimensions are still computed correctly
and not lost
* Attempt to reflect hidden dim in NomicBertSelfAttention output shape
* Attempt to format output head shape for NomicBertSelfAttention
* Add is_decoder check to NomicBertSelfAttention
Although NomicBERT is encoder only model BertGeneration also requires it
to have decoder capabilities
* Update NomicBertSelfAttention to handle dynamic cache
* Update layer_idx to be a valid integer in NomicBertSelfAttention
* Alter output value size for NomicBertSelfAttention
* Improve seq_len_offset and past key robustness for NomicBertSelfAttention
* Implement left-padded batch inference for Nomic_Bert
* Explicitly add helper functions to NomicBert
* Fix dynamic cache issues within modular_nomic_bert
* Debug key errors of modular_nomic_bert
* Attempt to prevent typeError noneType in modular_nomic_bert
* Fix cache use in modular_nomic_bert
* Remove past_key_value tensor logic in modular_nomic_bert
Only use instances of Cache objects over tensor objects when dealing
with pask_key_value
* Convert legacy tensor format to Cache in modular_nomic_bert
* Implement reorder_cache for nomic_bert_model method
* Add none type safety to modular_nomic_bert
* Fix potential crop on none issues for modular_nomic_bert
* Fix RoPE offset misuse in modular_nomic_bert
* Use seqlen_offset for Cached decoding in NomicBertSelfAttention
* Handle any attention mask mismatches in NomicBertSelfAttention
* Update pretrain paths in test_modeling_nomic_bert
* Make test_modeling_nomic_bert more specific to nomic bert
* Prioritise position_ids over offset in NomicBertSelfAttention
* Remove noneType error from NomicBertEncoder
* Fix 1 is not greater than 1 error in modular_nomic_bert
* Update test suite to be nomic bert suitable
Co-authored-by: Felix Arkle <felixarkle@icloud.com>
* Update NomicBERT to comply with new hugging face standards
* Add dates to nomic_bert.md
* Resolve RoPE issues with large decoder inputs for NomicBERT
* Update nomic_bert.md and remove einops dependency from setup.py
* Upate RoPE for NomicBERT
- Update RoPE to be consistent with llama
- Add the BertTokenizer to NomicBERT
- Remove some redundant features within modular_nomic_bert.py
* Adapt NomicBERT to be more consistent with bert
- Adapted `NomicBertSelfAttention`, `NomicBertAttention`,
`NomicBertIntermediate`, `NomicBertLayer`, `NomicBertEncoder`,
`NomicBertModel`
* Update NomicBertModel to create positionEmbeddings
* Make RoPE parameters for NomicBERT more robust
Fallback to defaults in init if RoPE parameters are None
* Resolved syntax error in calling NomicBertAttention
* Update NomicBertSelfAttention
Make NomicBertSelfAttention more consistent with BertSelfAttention,
taking RoPE elements to be consistent with LlamaAttention
* Update test_modeling_nomic_bert
Remove tests relating to decoder logic
* Resolve repo consistency errors within NomicBERT
* Update eager_attention_forward in NomicBERT
* Resolve attention mask errors in NomicBERT
* Update convert_nomic_bert_to_hf
Utilise conversion_mapping
* Reformat NomicBERT to meet repo standards
* Implement recommendations into modular_nomic_bert
* Update NomicBert.md and convert_nomic_bert_to_hf
* Fix issues found in modular_nomic_bert
* Update test_modeling_nomic_bert to use nomic-embed-text-v1.5
Was originally nomic-bert-2048
* Alter modular_nomic_bert to avoid legacy naming collisions
* Refactor NomicBERT to be more consistent with HF standards
* Refactor Nomic Bert
* Resolve NomicBert testEager value error
* Alter NomicBertSelfAttention to inherit from LlamaAttention
* Update docs date for nomic_bert.md
* Add missing embedding dropout to Nomic Bert
* Reformat Nomic Bert config to be consistent with new standards
* Refactor nomic bert to inherit more from Jina V3
* fixup jina v3 (doesnt use token type ids it seems) and inherit more fixups for nomic bert
* Update nomic_bert.md
Update examples to no longer cross reference bert in their tokenizer
* quick fixups
* fix
* fixes to align with original model
* fixups
* fix
* remove conversion script, revert changes in conversion mapping
* a10 numbers with remote model
* fixup all other classes and conversion
* fix
* new hub revision
* v1 tests, same code - slightly different config (on the hub)
* fix wrong defaults
* numbers didnt change on a10
* fix warning
* update docs
* update docs per toms review
---------
Co-authored-by: Felix Arkle <felixarkle@icloud.com>
Co-authored-by: felixarkle <72893613+Bumsparkle@users.noreply.github.com>
Co-authored-by: vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>