Buchbeitrag

Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations

This article presents a discussion on the main linguistic phenomena which cause difficulties in the analysis of user-generated texts found on the web and in social media, and proposes a set of annotation guidelines for their treatment within the Universal Dependencies (UD) framework of syntactic analysis. Given on the one hand the increasing number of treebanks featuring user-generated content, and its somewhat inconsistent treatment in these resources on the other, the aim of this article is twofold: (1) to provide a condensed, though comprehensive, overview of such treebanks—based on available literature—along with their main features and a comparative analysis of their annotation criteria, and (2) to propose a set of tentative UD-based annotation guidelines, to promote consistent treatment of the particular phenomena found in these types of texts. The overarching goal of this article is to provide a common framework for researchers interested in developing similar resources in UD, thus promoting cross-linguistic consistency, which is a principle that has always been central to the spirit of UD.

Treebanking user-generated content: a UD based overview of guidelines, corpora and unified recommendations

Urheber*in: Sanguinetti, Manuela; Bosco, Cristina; Cassidy, Lauren; Cetinoglu, Özlem; Cignarella, Alessandra Teresa; Lynn, Teresa; Rehbein, Ines; Ruppenhofer, Josef; Seddah, Djamé; Zeldes, Amir

Attribution 4.0 International

Language
Englisch

Subject
World Wide Web
Annotation
Angewandte Linguistik
Social Media
Datenbanksystem
Strukturbaum
Sprache

Event
Geistige Schöpfung
(who)
Sanguinetti, Manuela
Bosco, Cristina
Cassidy, Lauren
Cetinoglu, Özlem
Cignarella, Alessandra Teresa
Lynn, Teresa
Rehbein, Ines
Ruppenhofer, Josef
Seddah, Djamé
Zeldes, Amir
Event
Veröffentlichung
(who)
Dordrecht [u.a.] : Springer
Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)
(when)
2022-04-05

URN
urn:nbn:de:bsz:mh39-110002
Last update
06.03.2025, 9:00 AM CET

Data provider

This object is provided by:
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.

Object type

  • Buchbeitrag

Associated

  • Sanguinetti, Manuela
  • Bosco, Cristina
  • Cassidy, Lauren
  • Cetinoglu, Özlem
  • Cignarella, Alessandra Teresa
  • Lynn, Teresa
  • Rehbein, Ines
  • Ruppenhofer, Josef
  • Seddah, Djamé
  • Zeldes, Amir
  • Dordrecht [u.a.] : Springer
  • Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Time of origin

  • 2022-04-05

Other Objects (12)