Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications

Abstract: In this paper, we describe our effort to create a new corpus for the evaluation of detecting and linking so-called survey variables in social science publications (e.g., "Do you believe in Heaven?"). The task is to recognize survey variable mentions in a given text, disambiguate them, and link them to the corresponding variable within a knowledge base. Since there are generally hundreds of candidates to link to and due to the wide variety of forms they can take, this is a challenging task within NLP. The contribution of our work is the first gold standard corpus for the variable detection and linking task. We describe the annotation guidelines and the annotation process. The produced corpus is multilingual - German and English - and includes manually curated word and phrase alignments. Moreover, it includes text samples that could not be assigned to any variables, denoted as negative examples. Based on the new dataset, we conduct an evaluation of several state-of-the-art text class

Location
Deutsche Nationalbibliothek Frankfurt am Main
Extent
Online-Ressource
Language
Englisch
Notes
Veröffentlichungsversion
begutachtet (peer reviewed)
In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC). 2018. ISBN 979-10-95546-00-9

Classification
Sprache, Linguistik

Event
Veröffentlichung
(where)
Mannheim
(when)
2018
Creator
Zielinski, Andrea
Mutschke, Peter
Contributor
European Language Resources Association (ELRA)

URN
urn:nbn:de:0168-ssoar-57723-2
Rights
Open Access; Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
Last update
15.08.2025, 7:26 AM CEST

Data provider

This object is provided by:
Deutsche Nationalbibliothek. If you have any questions about the object, please contact the data provider.

Associated

Time of origin

  • 2018

Other Objects (12)