Konferenzbeitrag

Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited

This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes.

Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited

Urheber*in: Rehbein, Ines; van Genabith, Josef

Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International

0
/
0

Sprache
Englisch

Thema
Korpus <Linguistik>
Syntaktische Analyse
Annotation
Sprache

Ereignis
Geistige Schöpfung
(wer)
Rehbein, Ines
van Genabith, Josef
Ereignis
Veröffentlichung
(wer)
Tartu : Northern European Association for Language Technology
(wann)
2017-01-13

URN
urn:nbn:de:bsz:mh39-57822
Letzte Aktualisierung
06.03.2025, 09:00 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Konferenzbeitrag

Beteiligte

  • Rehbein, Ines
  • van Genabith, Josef
  • Tartu : Northern European Association for Language Technology

Entstanden

  • 2017-01-13

Ähnliche Objekte (12)