Conference paper | Konferenzbeitrag

Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications

In this paper, we describe our effort to create a new corpus for the evaluation of detecting and linking so-called survey variables in social science publications (e.g., "Do you believe in Heaven?"). The task is to recognize survey variable mentions in a given text, disambiguate them, and link them to the corresponding variable within a knowledge base. Since there are generally hundreds of candidates to link to and due to the wide variety of forms they can take, this is a challenging task within NLP. The contribution of our work is the first gold standard corpus for the variable detection and linking task. We describe the annotation guidelines and the annotation process. The produced corpus is multilingual - German and English - and includes manually curated word and phrase alignments. Moreover, it includes text samples that could not be assigned to any variables, denoted as negative examples. Based on the new dataset, we conduct an evaluation of several state-of-the-art text classification and textual similarity methods. The annotated corpus is made available along with an open-source baseline system for variable mention identification and linking.

Towards a Gold Standard Corpus for Variable Detection and Linking in Social Science Publications

Urheber*in: Zielinski, Andrea; Mutschke, Peter

Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International

0
/
0

ISBN
979-10-95546-00-9
Sprache
Englisch
Anmerkungen
Status: Veröffentlichungsversion; begutachtet (peer reviewed)
11. International Conference on Language Resources and Evaluation (LREC). Miyazaki (Japan), 2018

Erschienen in
Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC)

Thema
Publizistische Medien, Journalismus,Verlagswesen
Literatur, Rhetorik, Literaturwissenschaft
Informationswissenschaft
Literaturwissenschaft, Sprachwissenschaft, Linguistik
Sozialwissenschaft
Publikation
Daten
Algorithmus
Computerlinguistik

Ereignis
Geistige Schöpfung
(wer)
Zielinski, Andrea
Mutschke, Peter
Ereignis
Veröffentlichung
(wer)
European Language Resources Association (ELRA)
(wo)
Deutschland
(wann)
2018

URN
urn:nbn:de:0168-ssoar-57723-2
Rechteinformation
GESIS - Leibniz-Institut für Sozialwissenschaften. Bibliothek Köln
Letzte Aktualisierung
21.06.2024, 16:26 MESZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
GESIS - Leibniz-Institut für Sozialwissenschaften. Bibliothek Köln. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Konferenzbeitrag

Beteiligte

  • Zielinski, Andrea
  • Mutschke, Peter
  • European Language Resources Association (ELRA)

Entstanden

  • 2018

Ähnliche Objekte (12)