transformers
4fd7f1a0 - Add Nemotron 3.5 ASR Streaming (#46565)

Commit
3 days ago
Add Nemotron 3.5 ASR Streaming (#46565) * add ParakeetForRNNT * ParakeetForRNNT config * generation mixin for RNNT * conversion script update * auto mappings * tests * draft PR * add chunking logic to the feature extractor * on top of #46331 * better streaming design * RNNT as the main class * rnn-t as main config * nit * wip: streaming encoder cache (pre-merge snapshot) * prop changes to modular * nit * update test with reproducers * add RNN-T loss * fix generate * correct loss function usage * processing handle ctc/rnnt/tdt diffs * udpate doc + fix pipeline * working commit * cleaner * nits * fix * fix * nit * add nemotron asr processor * tmp commit * update loss reduction to match NeMo * udpate expected value * loss reduction * loss reduction to tdt loss * conversion udpate * fix * update checkpoint * nit * nit * fix tdt loss * fix loss * fix loss test * make * use correct checkpoint * AutoModelForRNNT in auto.md * add reproducable tests + other small fixes * fix * add revision * nit * add loss * fix * nit * better formulated forward * add groups to VoxtralRealtimeCausalConv1d * updates * init commit * clean up * add nemotron_asr to mapping for pipeline * update doc * use inheritance on generate * draft streaming * nits * fixes * update num_lookahead_tokens API * add streaming latency * nit * improve doc * update * doc update * cleaning generation loop * fix * use hub checkpoints * udpate tests * NemotronAsrConfig update * update doc * NemotronAsr -> NemotronAsrStreaming * update doc * update license * nit * test update * make * simplify a bit modular * remove cached_property * nit * improve comment * refacto NemotronAsrStreamingEncoderSubsamplingConv2D * address comment * add all_masked_rows as a kwarg * rename fixtures * check-repo * Migrate Nemotron3_5Asr onto NemotronAsrStreaming; reuse its generation mixin The base model was renamed nemotron_asr -> nemotron_asr_streaming (NemotronAsr* -> NemotronAsrStreaming*) and evolved (Parakeet-based RNN-T generation, refactored encoder subsampling, stateful processor num_lookahead API). Repoint Nemotron3_5Asr onto it: - generation: Nemotron3_5AsrGenerationMixin now subclasses NemotronAsrStreamingGenerationMixin and only overrides generate() to stash prompt_ids (the offline encode and every streaming chunk read it via get_audio_features). ~370 lines removed. - config/modeling/processing/feature-extraction: inherit the NemotronAsrStreaming* classes; forward returns the streaming cache fields (encoder_past_key_values, padding_cache). - processor __call__ re-based on the new API (default_num_lookahead_tokens via set_num_lookahead_tokens; dropped streaming_latency_ms). - conversion: updated encoder subsampling weight mapping (conv_in + depthwise/pointwise). - re-added auto registrations (wiped by the rename); encoder reuses NemotronAsrStreamingEncoder. * updates * use nemotron asr feature extractor * update * correct auto mappings * simplify * correct type * fix * update * update tests * update * nit * nits * fix typing * NemotronAsrStreamingProcessor without modular * update proc * clean a bit * nits * nit * tests udpates * add default_prompt_id * update doc * doc * update default * not necessary * remove revision * remove _prompt_ids from modeling * auto in proc clearer * make * nit * not necessary * fix generate num_lookahead_tokens * make style * unused field * fixes
Author
Parents
Loading