Artikel

A comparable Wikipedia corpus: from wiki syntax to POS tagged XML

To build a comparable Wikipedia corpus of German, French, Italian, Norwegian, Polish and Hungarian for contrastive grammar research, we used a set of XSLT stylesheets to transform the mediawiki anntations to XML. Furthermore, the data has been amnntated with word class information using different taggers. The outcome is a corpus with rich meta data and linguistic annotation that can be used for multilingual research in various linguistic topics.

A comparable Wikipedia corpus: from wiki syntax to POS tagged XML

Urheber*in: Bubenhofer, Noah; Haupt, Stefanie; Schwinn, Horst

In copyright

Language
Englisch

Subject
Korpus <Linguistik>
Wikipedia
Kontrastive Grammatik
Sprache

Event
Geistige Schöpfung
(who)
Bubenhofer, Noah
Haupt, Stefanie
Schwinn, Horst
Event
Veröffentlichung
(who)
Hamburg : Universität Hamburg
(when)
2016-08-22

URN
urn:nbn:de:bsz:mh39-51897
Last update
06.03.2025, 9:00 AM CET

Data provider

This object is provided by:
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.

Object type

  • Artikel

Associated

  • Bubenhofer, Noah
  • Haupt, Stefanie
  • Schwinn, Horst
  • Hamburg : Universität Hamburg

Time of origin

  • 2016-08-22

Other Objects (12)