nvda
1dc0f762 - Add leading silence detection and removal logic (#17648)

Commit

1 year ago

Add leading silence detection and removal logic (#17648) Closes #17614 Summary of the issue: Some voices output a leading silence part before the actual speech voice. By removing the silence part, the delay between keypress and user hearing the audio will be shorter, therefore make the voices more responsive. Description of user facing changes Users may find the voices more responsive. All voices using NVDA's WavePlayer will be affected, including eSpeak-NG, OneCore, SAPI5, and some third-party voice add-ons. This should only affect the leading silence parts. Silence between sentences or at punctuation marks are not changed, but this may depend on how the voice uses WavePlayer. Description of development approach I wrote a header-only library silenceDetect.h in nvdaHelper/local. It supports most wave formats (8/16/24/32-bit integer and 32/64-bit float wave), and uses a simple algorithm: check each sample to see if it's outside threshold range (currently hard-coded to +/- 1/2^10 or 0.0009765625). It uses template-related code and requires C++ 20 standard. The WasapiPlayer in wasapi.cpp is updated to handle silence. A new member function, startTrimmingLeadingSilence, and the exported version wasPlay_startTrimmingLeadingSilence, is added, to set or clear the isTrimmingLeadingSilence flag. If isTrimmingLeadingSilence is true, the next chunk fed in will have its leading silence removed. When non-silence is detected, isTrimmingLeadingSilence will be reset to false. So every time a new utterance is about to be spoken, startTrimmingLeadingSilence should be called. In nvwave.py, startTrimmingLeadingSilence() will be called when: the player is initialized; the player is stopped; idle is called; _idleCheck determines that the player is idle. Usually voices will call idle when an utterance is completed, so that audio ducking can work correctly, so here idle is used to mark the starting point of the next utterance. If a voice doesn't use idle this way, then this logic might be messed up. As long as the synthesizer uses idle as intended, the synthesizer's code doesn't need to be modified to benefit from this feature. As leading silence can also be introduced by a BreakCommand at the beginning of the speech sequence, WavePlayer will check the speech sequence first; if there's a BreakCommand at the beginning, the leading silence will not be trimmed for the current utterance. To check the exact speech sequence that is about to be spoken, a new extension point, pre_synthSpeak, is added in synthDriverHandler, which will be invoked just before SpeechManager calls getSynth().speak(). The existing pre_speech is called before SpeechManager processes and queues the speech sequence, so pre_synthSpeak is needed to provide a more accurate sequence. When the purpose of a WavePlayer is not SPEECH, it does not trim the leading silence by default, because of the way playWaveFile works (it calls idle after every chunk). Users of WavePlayer will still be able to enable/disable automatic trimming by calling enableTrimmingLeadingSilence, or to initiate trimming manually for the next audio section by calling startTrimmingLeadingSilence. Other possible ways/things that may worth considering (but hasn't been implemented): Put silence detection/removal logic in a separate module instead of in WavePlayer. The drawback is that every voice synthesizer module needs to be modified to utilize a separate module. Use another audio library, such as PyDub, to detect/remove silence. Add a setting item to turn this feature on or off. Add a public API function to allow a synthesizer to opt out of this feature.

References

#17648 - Add leading silence detection and removal logic

Author

gexgd0419

Parents

ffd1cf54

nvda 1dc0f762 - Add leading silence detection and removal logic (#17648)

nvda
1dc0f762 - Add leading silence detection and removal logic (#17648)