Dissertation o. Habilitation
Danish Stød and Automatic Speech Recognition
Stød is a prosodic feature in Danish spoken language that is able to distinguish lexemes. This distinction can also identify word class and has the potential to improve the performance of automatic speech recognisers for Danish spoken language. Stød manifestation exhibits a large amount of variability and may be perceptual in nature, because stød in some cases can be audibly perceived yet not be visible in a spectrogram. The variability is the primary reason there is currently no agreed upon acoustic or phonetic definition of stød. The working definition of stød is “. . . a kind of creaky voice” (Grønnum, 2005) and “stød is not just creak” (Hansen, 2015). In the present work, we investigate whether stød can be exploited in automatic speech recognition. To exploit stød without an acoustic or phonetic definition, we need to use a (almost) zero-knowledge datadriven approach which is based on a number of assumptions that we investigate prior to conducting ASR experimentation. We assume that stød can be detected in audio input, using acoustic features. To detect stød, we need to identify features that signal stød, which requires annotated data. To select the right features, the stød annotation must be reliable and accurate. We therefore conduct a reliability study of stød annotation with inter-annotator agreement measures, rank acoustic features for stød detection according to feature importance using a forest of randomised decision trees and experiment with stød detection as a binary and multi-class classification task. The experiments identify a set of features important or stød detection and confirms that we can detect stød in audio. Lastly, we model stød in automatic speech recognition and show that significant improvements in word error rate can be gained simply by annotating stød in the phonetic dictionary at the expense of decoding speed. Extending the acoustic feature vectors with pitch-related features and other features of voice quality also give significant performance improvement on both read-aloud speech and spontaneous speech. Decoding speed increases when we extend the acoustic feature vectors and actually improve decoding speed over the baseline where stød is not modelled.
- ISBN
-
9788793483132
- Language
-
Englisch
- Bibliographic citation
-
Series: PhD Series ; No. 24.2016
- Classification
-
Management
- Event
-
Geistige Schöpfung
- (who)
-
Kirkedal, Andreas Søeborg
- Event
-
Veröffentlichung
- (who)
-
Copenhagen Business School (CBS)
- (where)
-
Frederiksberg
- (when)
-
2016
- Handle
- Last update
-
10.03.2025, 11:41 AM CET
Data provider
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.
Object type
- Dissertation o. Habilitation
Associated
- Kirkedal, Andreas Søeborg
- Copenhagen Business School (CBS)
Time of origin
- 2016