Artikel
A comparable Wikipedia corpus: from wiki syntax to POS tagged XML
To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.
- Language
-
Englisch
- Subject
-
Korpus <Linguistik>
Wikipedia
Kontrastive Grammatik
Sprache
- Event
-
Geistige Schöpfung
- (who)
-
Bubenhofer, Noah
Haupt, Stefanie
Schwinn, Horst
- Event
-
Veröffentlichung
- (who)
-
Hamburg : Universität Hamburg
- (when)
-
2016-08-22
- URN
-
urn:nbn:de:bsz:mh39-51897
- Last update
-
06.03.2025, 9:00 AM CET
Data provider
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.
Object type
- Artikel
Associated
- Bubenhofer, Noah
- Haupt, Stefanie
- Schwinn, Horst
- Hamburg : Universität Hamburg
Time of origin
- 2016-08-22