Buchbeitrag

Transcription Bottleneck of Speech Corpus Exploitation

While written corpora can be exploited without any linguistic annotations, speech corpora need at least a basic transcription to be of any use for linguistic research. The basic annotation of speech data usually consists of time-aligned orthographic transcriptions. To answer phonetic or phonological research questions, phonetic transcriptions are needed as well. However, manual annotation is very time-consuming and requires considerable skill and near-native competence. Therefore it can take years of speech corpus compilation and annotation before any analyses can be carried out. In this paper, approaches that address the transcription bottleneck of speech corpus exploitation are presented and discussed, including crowdsourcing the orthographic transcription, automatic phonetic alignment, and query-driven annotation. Currently, query-driven annotation and automatic phonetic alignment are being combined and applied in two speech research projects at the Institut für Deutsche Sprache (IDS), whereas crowdsourcing the orthographic transcription still awaits implementation.

Transcription Bottleneck of Speech Corpus Exploitation

Urheber*in: Brinckmann, Caren

Urheberrechtsschutz

0
/
0

Sprache
Englisch

Thema
Gesprochene Sprache
Korpus <Linguistik>
Lautschrift
Annotation
Germanische Sprachen; Deutsch

Ereignis
Geistige Schöpfung
(wer)
Brinckmann, Caren
Ereignis
Veröffentlichung
(wer)
Bozen : Europäische Akademie
(wann)
2017-12-13

URN
urn:nbn:de:bsz:mh39-68329
Letzte Aktualisierung
06.03.2025, 09:00 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Buchbeitrag

Beteiligte

  • Brinckmann, Caren
  • Bozen : Europäische Akademie

Entstanden

  • 2017-12-13

Ähnliche Objekte (12)