Konferenzbeitrag

A harmonised testsuite for POS tagging of German social media data

We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data, we harmonised the annotations in already existing web corpora, based on the Stuttgart-Tübingen Tag Set. The current version of the corpus has an overall size of 48,344 tokens of web data, around half of it from Twitter. We also present experiments, showing how different experimental setups (training set size, additional out-of-domain training data, self-training) influence the accuracy of the taggers. All resources and models will be made publicly available to the research community.

Urheber*in: Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor

Urheberrechtsschutz

Sprache: Englisch

Thema: Korpus <Linguistik>
Deutsch
Soziale Software
Sprache

Ereignis: Geistige Schöpfung

(wer): Rehbein, Ines
Ruppenhofer, Josef
Zimmermann, Victor

Ereignis: Veröffentlichung

(wer): Vienna, Austria : Austrian academy of sciences

(wann): 2018-09-20

URN: urn:nbn:de:bsz:mh39-79318

Letzte Aktualisierung: 06.03.2025, 09:00 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Original beim Datenpartner anzeigen

Objekttyp

Konferenzbeitrag

Beteiligte

Rehbein, Ines
Ruppenhofer, Josef
Zimmermann, Victor
Vienna, Austria : Austrian academy of sciences

Entstanden

2018-09-20

Ähnliche Objekte (12)

Konferenzbeitrag

A New Resource for German Causal Language

Konferenzbeitrag

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Konferenzbeitrag

Detecting the boundaries of sentence-like units on spoken German

Konferenzbeitrag

Evaluating the Impact of Coder Errors on Active Learning

Konferenzbeitrag

Semantic frames as an anchor representation for sentiment analysis

Konferenzbeitrag

Catching the common cause: extraction and annotation of causal relations and their participants

Konferenzbeitrag

Yes we can!? Annotating the senses of English modal verbs

Konferenzbeitrag

Improving Sentence Boundary Detection for Spoken Language Transcripts

Konferenzbeitrag

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Konferenzbeitrag

Bringing Active Learning to Life

Konferenzbeitrag

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Konferenzbeitrag

Fine-grained Named Entity Annotations for German Biographic Interviews

Konferenzbeitrag

A New Resource for German Causal Language

Konferenzbeitrag

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Konferenzbeitrag

Detecting the boundaries of sentence-like units on spoken German

Konferenzbeitrag

Evaluating the Impact of Coder Errors on Active Learning

Konferenzbeitrag

Semantic frames as an anchor representation for sentiment analysis

Konferenzbeitrag

Catching the common cause: extraction and annotation of causal relations and their participants

Konferenzbeitrag

Yes we can!? Annotating the senses of English modal verbs

Konferenzbeitrag

Improving Sentence Boundary Detection for Spoken Language Transcripts

Konferenzbeitrag

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Konferenzbeitrag

Bringing Active Learning to Life

Konferenzbeitrag

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Konferenzbeitrag

Fine-grained Named Entity Annotations for German Biographic Interviews

Konferenzbeitrag

A New Resource for German Causal Language

Konferenzbeitrag

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Konferenzbeitrag

Detecting the boundaries of sentence-like units on spoken German

Konferenzbeitrag

Evaluating the Impact of Coder Errors on Active Learning

Konferenzbeitrag

Semantic frames as an anchor representation for sentiment analysis

Konferenzbeitrag

Catching the common cause: extraction and annotation of causal relations and their participants

Konferenzbeitrag

Yes we can!? Annotating the senses of English modal verbs

Konferenzbeitrag

Improving Sentence Boundary Detection for Spoken Language Transcripts

Konferenzbeitrag

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Konferenzbeitrag

Bringing Active Learning to Life

Konferenzbeitrag

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Konferenzbeitrag

Fine-grained Named Entity Annotations for German Biographic Interviews

Informationen zur Registrierung von Kultur- und Wissenseinrichtungen finden Sie hier.

Felder mit * müssen ausgefüllt werden.

Benutzername*

Bitte geben Sie Ihren Benutzernamen ein

E-Mail*

Bitte geben Sie Ihre E-Mail ein

Bitte füllen Sie dieses Feld nicht aus

Vorname

Nachname

Passwort*

Bitte geben Sie Ihr Passwort ein

Passwort bestätigen*

Bitte geben Sie das gleiche Passwort ein

Ich habe die Nutzungsbedingungen und die Datenschutzerklärung zur Erhebung persönlicher Daten gelesen und stimme ihnen zu. *

Dieses Feld ist ein Pflichtfeld.

Ich möchte den Newsletter der Deutschen Digitalen Bibliothek abonnieren. Siehe Informationen zum Newsletter-Abonnement.

Benutzerkonto angelegt

Ihr „Meine DDB“-Konto wurde erfolgreich angelegt. Bevor Sie sich in Ihrem Konto anmelden können, müssen Sie auf den Bestätigungslink in der Nachricht klicken, die wir gerade an die von Ihnen angegebene E-Mail-Adresse geschickt haben

A harmonised testsuite for POS tagging of German social media data

Download

Angaben zum Objekt

Klassifikation und Themen

Beteiligte, Orts- und Zeitangaben

Weitere Informationen

Datenpartner

Objekttyp

Beteiligte

Entstanden

Ähnliche Objekte (12)

A New Resource for German Causal Language

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Detecting the boundaries of sentence-like units on spoken German

Evaluating the Impact of Coder Errors on Active Learning

Semantic frames as an anchor representation for sentiment analysis

Catching the common cause: extraction and annotation of causal relations and their participants

Yes we can!? Annotating the senses of English modal verbs

Improving Sentence Boundary Detection for Spoken Language Transcripts

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Bringing Active Learning to Life

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Fine-grained Named Entity Annotations for German Biographic Interviews

A New Resource for German Causal Language

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Detecting the boundaries of sentence-like units on spoken German

Evaluating the Impact of Coder Errors on Active Learning

Semantic frames as an anchor representation for sentiment analysis

Catching the common cause: extraction and annotation of causal relations and their participants

Yes we can!? Annotating the senses of English modal verbs

Improving Sentence Boundary Detection for Spoken Language Transcripts

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Bringing Active Learning to Life

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Fine-grained Named Entity Annotations for German Biographic Interviews

A New Resource for German Causal Language

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Detecting the boundaries of sentence-like units on spoken German

Evaluating the Impact of Coder Errors on Active Learning

Semantic frames as an anchor representation for sentiment analysis

Catching the common cause: extraction and annotation of causal relations and their participants

Yes we can!? Annotating the senses of English modal verbs

Improving Sentence Boundary Detection for Spoken Language Transcripts

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Bringing Active Learning to Life

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Fine-grained Named Entity Annotations for German Biographic Interviews

Verbundene Objekte

Passwort zurücksetzen