Internalise the NomicBERT model (#43067)

Commit

50 days ago

Internalise the NomicBERT model (#43067) * Created nomicBert skeleton * Implement modular_nomic_bert Co-authored-by: Felix Arkle <felixarkle@icloud.com> * Implement convert_nomic_bert_to_hf * Complete nomic_bert conversion script key mappings and fix NomicBertSelfAttention signature * Create nomic bert documentation Implemented descriptions for the main nomic bert documentation and debugged modular_nomic_bert * Remove redundancies and improve documentation for nomic_bert Co-authored-by: Felix Arkle <felixarkle@icloud.com> * Update dependencies for nomic_bert Add einops to setup and add availibility checks for more graceful exit if not available * Fix nomic_bert attention mechanism previous version overrote bert, leading to forward_unimplemented * Correct past_key_value to past_key_values in NomicBertSelfAttention * Implement cache_position into NomicBertSelfAttention * Implement transpose_for_scores in NomicBertSelfAttention * Fix nomicBertSelfAttention Remove code which broke the encoder only assumption * Add kwargs to NomicBertSelfAttention and ignore non-encoder logic * Add past_key_values to NomicBertSelfAttention output * Alter head dimension logic for NomicBertSelfAttention Alter logic so smaller hidden dimensions are still computed correctly and not lost * Attempt to reflect hidden dim in NomicBertSelfAttention output shape * Attempt to format output head shape for NomicBertSelfAttention * Add is_decoder check to NomicBertSelfAttention Although NomicBERT is encoder only model BertGeneration also requires it to have decoder capabilities * Update NomicBertSelfAttention to handle dynamic cache * Update layer_idx to be a valid integer in NomicBertSelfAttention * Alter output value size for NomicBertSelfAttention * Improve seq_len_offset and past key robustness for NomicBertSelfAttention * Implement left-padded batch inference for Nomic_Bert * Explicitly add helper functions to NomicBert * Fix dynamic cache issues within modular_nomic_bert * Debug key errors of modular_nomic_bert * Attempt to prevent typeError noneType in modular_nomic_bert * Fix cache use in modular_nomic_bert * Remove past_key_value tensor logic in modular_nomic_bert Only use instances of Cache objects over tensor objects when dealing with pask_key_value * Convert legacy tensor format to Cache in modular_nomic_bert * Implement reorder_cache for nomic_bert_model method * Add none type safety to modular_nomic_bert * Fix potential crop on none issues for modular_nomic_bert * Fix RoPE offset misuse in modular_nomic_bert * Use seqlen_offset for Cached decoding in NomicBertSelfAttention * Handle any attention mask mismatches in NomicBertSelfAttention * Update pretrain paths in test_modeling_nomic_bert * Make test_modeling_nomic_bert more specific to nomic bert * Prioritise position_ids over offset in NomicBertSelfAttention * Remove noneType error from NomicBertEncoder * Fix 1 is not greater than 1 error in modular_nomic_bert * Update test suite to be nomic bert suitable Co-authored-by: Felix Arkle <felixarkle@icloud.com> * Update NomicBERT to comply with new hugging face standards * Add dates to nomic_bert.md * Resolve RoPE issues with large decoder inputs for NomicBERT * Update nomic_bert.md and remove einops dependency from setup.py * Upate RoPE for NomicBERT - Update RoPE to be consistent with llama - Add the BertTokenizer to NomicBERT - Remove some redundant features within modular_nomic_bert.py * Adapt NomicBERT to be more consistent with bert - Adapted `NomicBertSelfAttention`, `NomicBertAttention`, `NomicBertIntermediate`, `NomicBertLayer`, `NomicBertEncoder`, `NomicBertModel` * Update NomicBertModel to create positionEmbeddings * Make RoPE parameters for NomicBERT more robust Fallback to defaults in init if RoPE parameters are None * Resolved syntax error in calling NomicBertAttention * Update NomicBertSelfAttention Make NomicBertSelfAttention more consistent with BertSelfAttention, taking RoPE elements to be consistent with LlamaAttention * Update test_modeling_nomic_bert Remove tests relating to decoder logic * Resolve repo consistency errors within NomicBERT * Update eager_attention_forward in NomicBERT * Resolve attention mask errors in NomicBERT * Update convert_nomic_bert_to_hf Utilise conversion_mapping * Reformat NomicBERT to meet repo standards * Implement recommendations into modular_nomic_bert * Update NomicBert.md and convert_nomic_bert_to_hf * Fix issues found in modular_nomic_bert * Update test_modeling_nomic_bert to use nomic-embed-text-v1.5 Was originally nomic-bert-2048 * Alter modular_nomic_bert to avoid legacy naming collisions * Refactor NomicBERT to be more consistent with HF standards * Refactor Nomic Bert * Resolve NomicBert testEager value error * Alter NomicBertSelfAttention to inherit from LlamaAttention * Update docs date for nomic_bert.md * Add missing embedding dropout to Nomic Bert * Reformat Nomic Bert config to be consistent with new standards * Refactor nomic bert to inherit more from Jina V3 * fixup jina v3 (doesnt use token type ids it seems) and inherit more fixups for nomic bert * Update nomic_bert.md Update examples to no longer cross reference bert in their tokenizer * quick fixups * fix * fixes to align with original model * fixups * fix * remove conversion script, revert changes in conversion mapping * a10 numbers with remote model * fixup all other classes and conversion * fix * new hub revision * v1 tests, same code - slightly different config (on the hub) * fix wrong defaults * numbers didnt change on a10 * fix warning * update docs * update docs per toms review --------- Co-authored-by: Felix Arkle <felixarkle@icloud.com> Co-authored-by: felixarkle <72893613+Bumsparkle@users.noreply.github.com> Co-authored-by: vasqu <antonprogamer@gmail.com> Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

References

#43067 - Internalise the NomicBERT model

Author

ed22699

Parents

4932e972

transformers a594e09e - Internalise the NomicBERT model (#43067)

transformers
a594e09e - Internalise the NomicBERT model (#43067)