vad : add initial Voice Activity Detection (VAD) support #3065
danbev
force pushed
from
0f2fe06f
to
b10e6ddd
254 days ago
danbev
force pushed
from
4aea8b3a
to
3cca1a23
253 days ago
danbev
force pushed
from
5758650f
to
9f0ed3d8
251 days ago
danbev
force pushed
from
63d3fe24
to
ebc79f8a
248 days ago
danbev
force pushed
from
b59768bb
to
798695fc
248 days ago
danbev
marked this pull request as ready for review 248 days ago
danbev
force pushed
from
798695fc
to
6b56b7df
244 days ago
ggerganov
force pushed
from
5a6236e9
to
60d561b8
237 days ago
danbev
commented
on 2025-05-09
vad : add initial Voice Activity Detection (VAD) support
871da0bf
examples : add VAD parameters to CLI [no ci]
24901683
ci : add job to test VAD
eb23253b
vad : map timestamps to original audio
59252c2b
squash! vad : add initial Voice Activity Detection (VAD) support [no ci]
37a36a33
vad : extract VAD processing to a separate function
033c0ce2
vad : add TODOs to optimize segment access [no ci]
028481e9
vad : only use CPU backend for VAD processing [no ci]
fc7ebf20
tests : fix strcmp assert and use beam search
3276232e
vad : dont reshape stft_forward_basis tensor
abc05c5c
vad : use ggml_row_size() and rename hdim_bytes to hdim_size
0e18ceba
vad : remove unnecessary ggml_cont
9bf1b4b3
vad : fix typo in log message
dc529950
vad : don't use left leaning ref for segment
2b057733
vad : use std::vector<float> instead float pointers
44bdef1b
vad : enable GPU support for VAD but default to false
27eb59bd
vad : use kebab-case and not snake_case for VAD options
643a91bf
vad : add h_state and c_state to whisper_vad_state
e4d43072
vad : always initialize filtered_n_samples to 0
94c3aba8
vad : use orig timestamp for first segment
e70e4861
vad : fix buffers and enable GPU support by default
436baeb7
vad : fix use_gpu assert in test-vad.cpp
eb2c83ee
vad : remove unnecessary reserve [no ci]
47c8f02d
vad : add probs to whisper_vad_state
327cdaee
vad : add timing of vad processing [no ci]
bf2b0df9
danbev
force pushed
from
50337e26
to
bf2b0df9
237 days ago
vad : force GPU off for now
243e0dba
vad : minor style and naming changes
65c421de
vad : minor style
cae38fda
vad : remove obsolete whisper_vad_free_speech
cd953ebf
vad : refactor whiser_vad_params API
f42e6e47
vad : simplify whisper_vad_timestamps_from_probs()
4ff858ba
vad : refactor whisper_vad_timestamps_from_probs to use C++
13a75177
ggerganov
force pushed
from
66bc585e
to
13a75177
236 days ago
vad : make whisper_vad_timestamps oblique in API
3bcc44c2
vad : rename whisper_vad_speech to whisper_vad_probs
5543c80c
vad : move whisper_vad_segment to whisper.cpp
8b6f19cf
vad : make segments vector a std::vector
7625ba16
vad : use std::vector for segments in whisper_vad_timestamps_from_probs
b0b2f9b4
vad : rename pcmf32 parameters to samples [no ci]
20fe0b35
vad : remove n_segments from struct whisper_vad_timestamps
f2123105
vad : rename whisper_vad_timestamps to whisper_vad_segments [no ci]
4c7fe00c
vad : remove whisper_vad_probs struct [no ci]
dc541f93
vad : remove whisper_vad_state struct
163ad538
vad : remove window_size_samples from VAD params
810981f4
vad : clarify VAD CLI options [no ci]
050038ca
docs : add VAD section to README.md [no ci]
3cff6587
squash! docs : add VAD section to README.md [no ci]
acc8747d
vad : minor rename
7aac6eca
ggerganov
approved these changes
on 2025-05-12
squash! docs : add VAD section to README.md [no ci]
41c20100
vad : fix cli option names [no ci]
67f0fd40
danbev
merged
e41bc5c6
into master 234 days ago
Assignees
No one assigned
Login to write a write a comment.
Login via GitHub