Add Nemotron 3.5 ASR Streaming (#46565)
* add ParakeetForRNNT
* ParakeetForRNNT config
* generation mixin for RNNT
* conversion script update
* auto mappings
* tests
* draft PR
* add chunking logic to the feature extractor
* on top of #46331
* better streaming design
* RNNT as the main class
* rnn-t as main config
* nit
* wip: streaming encoder cache (pre-merge snapshot)
* prop changes to modular
* nit
* update test with reproducers
* add RNN-T loss
* fix generate
* correct loss function usage
* processing handle ctc/rnnt/tdt diffs
* udpate doc + fix pipeline
* working commit
* cleaner
* nits
* fix
* fix
* nit
* add nemotron asr processor
* tmp commit
* update loss reduction to match NeMo
* udpate expected value
* loss reduction
* loss reduction to tdt loss
* conversion udpate
* fix
* update checkpoint
* nit
* nit
* fix tdt loss
* fix loss
* fix loss test
* make
* use correct checkpoint
* AutoModelForRNNT in auto.md
* add reproducable tests + other small fixes
* fix
* add revision
* nit
* add loss
* fix
* nit
* better formulated forward
* add groups to VoxtralRealtimeCausalConv1d
* updates
* init commit
* clean up
* add nemotron_asr to mapping for pipeline
* update doc
* use inheritance on generate
* draft streaming
* nits
* fixes
* update num_lookahead_tokens API
* add streaming latency
* nit
* improve doc
* update
* doc update
* cleaning generation loop
* fix
* use hub checkpoints
* udpate tests
* NemotronAsrConfig update
* update doc
* NemotronAsr -> NemotronAsrStreaming
* update doc
* update license
* nit
* test update
* make
* simplify a bit modular
* remove cached_property
* nit
* improve comment
* refacto NemotronAsrStreamingEncoderSubsamplingConv2D
* address comment
* add all_masked_rows as a kwarg
* rename fixtures
* check-repo
* Migrate Nemotron3_5Asr onto NemotronAsrStreaming; reuse its generation mixin
The base model was renamed nemotron_asr -> nemotron_asr_streaming (NemotronAsr* ->
NemotronAsrStreaming*) and evolved (Parakeet-based RNN-T generation, refactored encoder
subsampling, stateful processor num_lookahead API). Repoint Nemotron3_5Asr onto it:
- generation: Nemotron3_5AsrGenerationMixin now subclasses NemotronAsrStreamingGenerationMixin
and only overrides generate() to stash prompt_ids (the offline encode and every streaming
chunk read it via get_audio_features). ~370 lines removed.
- config/modeling/processing/feature-extraction: inherit the NemotronAsrStreaming* classes;
forward returns the streaming cache fields (encoder_past_key_values, padding_cache).
- processor __call__ re-based on the new API (default_num_lookahead_tokens via
set_num_lookahead_tokens; dropped streaming_latency_ms).
- conversion: updated encoder subsampling weight mapping (conv_in + depthwise/pointwise).
- re-added auto registrations (wiped by the rename); encoder reuses NemotronAsrStreamingEncoder.
* updates
* use nemotron asr feature extractor
* update
* correct auto mappings
* simplify
* correct type
* fix
* update
* update tests
* update
* nit
* nits
* fix typing
* NemotronAsrStreamingProcessor without modular
* update proc
* clean a bit
* nits
* nit
* tests udpates
* add default_prompt_id
* update doc
* doc
* update default
* not necessary
* remove revision
* remove _prompt_ids from modeling
* auto in proc clearer
* make
* nit
* not necessary
* fix generate num_lookahead_tokens
* make style
* unused field
* fixes