Konferenzbeitrag

What do we need to know about an unknown word when parsing German

We propose a new type of subword embedding designed to provide more information about unknown compounds, a major source for OOV words in German. We present an extrinsic evaluation where we use the compound embeddings as input to a neural dependency parser and compare the results to the ones obtained with other types of embeddings. Our evaluation shows that adding compound embeddings yields a significant improvement of 2% LAS over using word embeddings when no POS information is available. When adding POS embeddings to the input, however, the effect levels out. This suggests that it is not the missing information about the semantics of the unknown words that causes problems for parsing German, but the lack of morphological information for unknown words. To augment our evaluation, we also test the new embeddings in a language modelling task that requires both syntactic and semantic information.

What do we need to know about an unknown word when parsing German

Urheber*in: Do, Bich-Ngoc; Rehbein, Ines; Frank, Anette

Namensnennung 4.0 International

Sprache
Englisch

Thema
Deutsch
Kompositum
Automatische Spracherkennung
Sprache

Ereignis
Geistige Schöpfung
(wer)
Do, Bich-Ngoc
Rehbein, Ines
Frank, Anette
Ereignis
Veröffentlichung
(wer)
Stroudsburg PA, USA : The Association for Computational Linguistics
(wann)
2018-10-02

URN
urn:nbn:de:bsz:mh39-80244
Letzte Aktualisierung
06.03.2025, 09:00 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Konferenzbeitrag

Beteiligte

  • Do, Bich-Ngoc
  • Rehbein, Ines
  • Frank, Anette
  • Stroudsburg PA, USA : The Association for Computational Linguistics

Entstanden

  • 2018-10-02

Ähnliche Objekte (12)