Konferenzbeitrag
Why is it so difficult to compare treebanks? TIGER and TüBa-D/Z revisited
This paper is a contribution to the ongoing discussion on treebank annotation schemes and their impact on PCFG parsing results. We provide a thorough comparison of two German treebanks: the TIGER treebank and the TüBa-D/Z. We use simple statistics on sentence length and vocabulary size, and more refined methods such as perplexity and its correlation with PCFG parsing results, as well as a Principal Components Analysis. Finally we present a qualitative evaluation of a set of 100 sentences from the TüBa- D/Z, manually annotated in the TIGER as well as in the TüBa-D/Z annotation scheme, and show that even the existence of a parallel subcorpus does not support a straightforward and easy comparison of both annotation schemes.
- Language
-
Englisch
- Subject
-
Korpus <Linguistik>
Syntaktische Analyse
Annotation
Sprache
- Event
-
Geistige Schöpfung
- (who)
-
Rehbein, Ines
van Genabith, Josef
- Event
-
Veröffentlichung
- (who)
-
Tartu : Northern European Association for Language Technology
- (when)
-
2017-01-13
- URN
-
urn:nbn:de:bsz:mh39-57822
- Last update
-
06.03.2025, 9:00 AM CET
Data provider
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.
Object type
- Konferenzbeitrag
Associated
- Rehbein, Ines
- van Genabith, Josef
- Tartu : Northern European Association for Language Technology
Time of origin
- 2017-01-13