Add VibeVoice ASR #43625

ebezzam wants to merge 66 commits into huggingface:main from ebezzam:vibevoice_asr
ebezzam
ebezzam Add vibevoice tokenizer files.
d4b2855c
ebezzam Address style tests.
598f1712
ebezzam Revert to expected outputs previously computed on runner.
fc16b6c8
ebezzam Enable encoder output test.
2a3d0751
ebezzam Update expected output from runner
00e168f6
ebezzam Add note on expected outputs
773f84dd
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' of github.com:ebezzam/tra…
a9704222
ebezzam remove code link and better init
5025b519
ebezzam Update src/transformers/models/vibevoice_acoustic_tokenizer/modular_v…
be050f14
ebezzam Update src/transformers/models/vibevoice_acoustic_tokenizer/modular_v…
81ed3d2e
ebezzam Update src/transformers/models/vibevoice_acoustic_tokenizer/modular_v…
a3da244b
ebezzam Update src/transformers/models/vibevoice_acoustic_tokenizer/modular_v…
372a0dd7
ebezzam modular
f9e342b9
ebezzam Same changes to decoder layers.
314b5593
ebezzam Update src/transformers/models/vibevoice_acoustic_tokenizer/modular_v…
c9944b04
ebezzam doc nits
e4b486bf
ebezzam Use decoder_depths for decoder!
aa568863
ebezzam Merge branch 'main' into vibevoice_acoustic_tokenizer
01a0089b
ebezzam Doc nits
236bcdff
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' of github.com:ebezzam/tra…
08efdb6b
ebezzam Nits
4802517a
ebezzam Trim feature extraction for tensor only usage.
53cb9233
ebezzam Merge branch 'main' into vibevoice_acoustic_tokenizer
fd05c4c2
ebezzam Merge branch 'main' into vibevoice_acoustic_tokenizer
77c42a68
ebezzam Start files for ASR
d0387ff3
ebezzam Add cache logic to encoder.
59029808
ebezzam Nit
d1b0905c
ebezzam Revert to previous sampling approach.
4ff855ab
ebezzam Nits
683fe81c
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' into vibevoice_asr
8ccf8829
ebezzam Passing equivalence test
3e7604d2
ebezzam ebezzam added New model
ebezzam ebezzam added Audio
ebezzam Fix for chat template to use sampling rate other than 16kHz
350e4934
ebezzam
ebezzam commented on 2026-01-30
ebezzam Better logic for vae sampling?
c8349411
ebezzam More standard conversion script.
35f20fd6
ebezzam Revert to sample flag
e298a792
ebezzam Nits
1413ce40
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' into vibevoice_asr
95b33dc3
ebezzam Make style
c70521b5
ebezzam Better modular and cleanup.
d11c4f53
ebezzam update asr docs
8eb24840
ebezzam Fix GLM docstring
fa1b05a1
ebezzam Merge branch 'main' into vibevoice_acoustic_tokenizer
b96f948a
ebezzam Docs, cleanup, nits.
9eb54f3c
ebezzam Nit
c034ab51
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' into vibevoice_asr
b51314a2
ebezzam Cleaner modular and nits
d29fc6f6
ebezzam Nits
8ee01180
ebezzam Nit
6d113acb
ebezzam Skip parallelism
1465b606
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' into vibevoice_asr
d18b0419
ebezzam Update docs.
81db6c27
HuggingFaceDocBuilderDev
ebezzam Finish integration tests, and nits
fc383395
ebezzam Repo checks
b2105c85
ebezzam doc nits
a03abe5b
ebezzam Doc nits
a2143c16
ebezzam Remove bad file
73ca1ec9
ebezzam
ebezzam commented on 2026-02-04
ebezzam ebezzam requested a review from eustlb eustlb 2 days ago
ebezzam Skip testing of encoder.
8360f811
ebezzam Shift cache creation to when it's used.
27510f0f
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' into vibevoice_asr
fb4c32f2
ebezzam Merge branch 'main' into vibevoice_acoustic_tokenizer
f4bc0733
ebezzam Shift cache creation to where it's used.
98a99dcf
ebezzam Updated checkpoint path
ec8e64e5
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' of github.com:ebezzam/tra…
d3556cb9
ebezzam Merge branch 'vibevoice_acoustic_tokenizer' into vibevoice_asr
a7351d6b
ebezzam Merge branch 'main' into vibevoice_asr
84b89c84
github-actions
ebezzam
ebezzam commented on 2026-02-06
ebezzam Processor nit
7edf3044

Login to write a write a comment.

Login via GitHub

Reviewers
Assignees
No one assigned
Labels
Milestone