Konferenzbeitrag

A harmonised testsuite for POS tagging of German social media data

We present a testsuite for POS tagging German web data. Our testsuite provides the original raw text as well as the gold tokenisations and is annotated for parts-of-speech. The testsuite includes a new dataset for German tweets, with a current size of 3,940 tokens. To increase the size of the data, we harmonised the annotations in already existing web corpora, based on the Stuttgart-Tübingen Tag Set. The current version of the corpus has an overall size of 48,344 tokens of web data, around half of it from Twitter. We also present experiments, showing how different experimental setups (training set size, additional out-of-domain training data, self-training) influence the accuracy of the taggers. All resources and models will be made publicly available to the research community.

Urheber*in: Rehbein, Ines; Ruppenhofer, Josef; Zimmermann, Victor

In copyright

Language: Englisch

Subject: Korpus <Linguistik>
Deutsch
Soziale Software
Sprache

Event: Geistige Schöpfung

(who): Rehbein, Ines
Ruppenhofer, Josef
Zimmermann, Victor

Event: Veröffentlichung

(who): Vienna, Austria : Austrian academy of sciences

(when): 2018-09-20

URN: urn:nbn:de:bsz:mh39-79318

Last update: 06.03.2025, 9:00 AM CET

Data provider

This object is provided by:
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.

Show original at data provider

Object type

Konferenzbeitrag

Associated

Rehbein, Ines
Ruppenhofer, Josef
Zimmermann, Victor
Vienna, Austria : Austrian academy of sciences

Time of origin

2018-09-20

Other Objects (12)

Konferenzbeitrag

A New Resource for German Causal Language

Konferenzbeitrag

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Konferenzbeitrag

Detecting the boundaries of sentence-like units on spoken German

Konferenzbeitrag

Evaluating the Impact of Coder Errors on Active Learning

Konferenzbeitrag

Semantic frames as an anchor representation for sentiment analysis

Konferenzbeitrag

Catching the common cause: extraction and annotation of causal relations and their participants

Konferenzbeitrag

Yes we can!? Annotating the senses of English modal verbs

Konferenzbeitrag

Improving Sentence Boundary Detection for Spoken Language Transcripts

Konferenzbeitrag

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Konferenzbeitrag

Bringing Active Learning to Life

Konferenzbeitrag

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Konferenzbeitrag

Fine-grained Named Entity Annotations for German Biographic Interviews

Konferenzbeitrag

A New Resource for German Causal Language

Konferenzbeitrag

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Konferenzbeitrag

Detecting the boundaries of sentence-like units on spoken German

Konferenzbeitrag

Evaluating the Impact of Coder Errors on Active Learning

Konferenzbeitrag

Semantic frames as an anchor representation for sentiment analysis

Konferenzbeitrag

Catching the common cause: extraction and annotation of causal relations and their participants

Konferenzbeitrag

Yes we can!? Annotating the senses of English modal verbs

Konferenzbeitrag

Improving Sentence Boundary Detection for Spoken Language Transcripts

Konferenzbeitrag

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Konferenzbeitrag

Bringing Active Learning to Life

Konferenzbeitrag

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Konferenzbeitrag

Fine-grained Named Entity Annotations for German Biographic Interviews

Konferenzbeitrag

A New Resource for German Causal Language

Konferenzbeitrag

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Konferenzbeitrag

Detecting the boundaries of sentence-like units on spoken German

Konferenzbeitrag

Evaluating the Impact of Coder Errors on Active Learning

Konferenzbeitrag

Semantic frames as an anchor representation for sentiment analysis

Konferenzbeitrag

Catching the common cause: extraction and annotation of causal relations and their participants

Konferenzbeitrag

Yes we can!? Annotating the senses of English modal verbs

Konferenzbeitrag

Improving Sentence Boundary Detection for Spoken Language Transcripts

Konferenzbeitrag

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Konferenzbeitrag

Bringing Active Learning to Life

Konferenzbeitrag

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Konferenzbeitrag

Fine-grained Named Entity Annotations for German Biographic Interviews

Cultural heritage institutions wishing to register will find more information here.

Fields marked * need to be filled in.

Username*

Please enter your username

Email*

Please enter your email address

Please do not fill this field

First name

Last name

Password*

Please enter your password

Confirm password*

Please enter the same password

I have read the terms of use and the privacy policy for the collection of personal data and accept them. *

This field is required.

I would like to subscribe to the newsletter of the Deutsche Digitale Bibliothek. See newsletter subscription info.

Account created

Your "My DDB" account has been successfully created. Before you can log in to your account, you must click the confirmation link in the message we just sent to the email address you provided.

A harmonised testsuite for POS tagging of German social media data

Download

Object Details

Classification and Topics

Contributors, Places and Time

Further information

Data provider

Object type

Associated

Time of origin

Other Objects (12)

A New Resource for German Causal Language

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Detecting the boundaries of sentence-like units on spoken German

Evaluating the Impact of Coder Errors on Active Learning

Semantic frames as an anchor representation for sentiment analysis

Catching the common cause: extraction and annotation of causal relations and their participants

Yes we can!? Annotating the senses of English modal verbs

Improving Sentence Boundary Detection for Spoken Language Transcripts

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Bringing Active Learning to Life

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Fine-grained Named Entity Annotations for German Biographic Interviews

A New Resource for German Causal Language

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Detecting the boundaries of sentence-like units on spoken German

Evaluating the Impact of Coder Errors on Active Learning

Semantic frames as an anchor representation for sentiment analysis

Catching the common cause: extraction and annotation of causal relations and their participants

Yes we can!? Annotating the senses of English modal verbs

Improving Sentence Boundary Detection for Spoken Language Transcripts

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Bringing Active Learning to Life

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Fine-grained Named Entity Annotations for German Biographic Interviews

A New Resource for German Causal Language

There’s no Data like More Data? Revisiting the Impact of Data Size on a Classification Task

Detecting the boundaries of sentence-like units on spoken German

Evaluating the Impact of Coder Errors on Active Learning

Semantic frames as an anchor representation for sentiment analysis

Catching the common cause: extraction and annotation of causal relations and their participants

Yes we can!? Annotating the senses of English modal verbs

Improving Sentence Boundary Detection for Spoken Language Transcripts

Who is we? Disambiguating the referents of first person plural pronouns in parliamentary debates

Bringing Active Learning to Life

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Fine-grained Named Entity Annotations for German Biographic Interviews

Related objects

Reset password