Konferenzbeitrag

How to Compare Treebanks

Recent years have seen an increasing interest in developing standards for linguistic annotation, with a focus on the interoperability of the resources. This effort, however, requires a profound knowledge of the advantages and disadvantages of linguistic annotation schemes in order to avoid importing the flaws and weaknesses of existing encoding schemes into the new standards. This paper addresses the question how to compare syntactically annotated corpora and gain insights into the usefulness of specific design decisions. We present an exhaustive evaluation of two German treebanks with crucially different encoding schemes. We evaluate three different parsers trained on the two treebanks and compare results using EVALB, the Leaf-Ancestor metric, and a dependency-based evaluation. Furthermore, we present TePaCoC, a new testsuite for the evaluation of parsers on complex German grammatical constructions. The testsuite provides a well thought-out error classification, which enables us to compare parser output for parsers trained on treebanks with different encoding schemes and provides interesting insights into the impact of treebank annotation schemes on specific constructions like PP attachment or non-constituent coordination.

How to Compare Treebanks

Urheber*in: Kübler, Sandra; Maier, Wolfgang; Rehbein, Ines; Versley, Yannick

Urheberrechtsschutz

0
/
0

Sprache
Englisch

Thema
Korpus <Linguistik>
Syntaktische Analyse
Sprache

Ereignis
Geistige Schöpfung
(wer)
Kübler, Sandra
Maier, Wolfgang
Rehbein, Ines
Versley, Yannick
Ereignis
Veröffentlichung
(wer)
Paris : European Language Resources Association
(wann)
2017-01-09

URN
urn:nbn:de:bsz:mh39-57520
Letzte Aktualisierung
06.03.2025, 09:00 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Konferenzbeitrag

Beteiligte

  • Kübler, Sandra
  • Maier, Wolfgang
  • Rehbein, Ines
  • Versley, Yannick
  • Paris : European Language Resources Association

Entstanden

  • 2017-01-09

Ähnliche Objekte (12)