Konferenzbeitrag

Mining corpora of computer-mediated communication: analysis of linguistic features in Wikipedia talk pages using machine learning methods

Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Our contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where we apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, we will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic features in talk pages of the German Wikipedia corpus in DeReKo provided by the IDS Mannheim. We will investigate different representations of the data to integrate complex syntactic and semantic information for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.

Urheber*in: Beißwenger, Michael; Lüngen, Harald; Margaretha, Eliza; Pölitz, Christian

Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International

Sprache: Englisch

Thema

Korpus <Linguistik>

Ereignis

Geistige Schöpfung

(wer)

Beißwenger, Michael
Lüngen, Harald
Margaretha, Eliza
Pölitz, Christian

(wann)

2014-11-03

Ereignis

Veröffentlichung

(wer)

Hildesheim : Universität Hildesheim

URN: urn:nbn:de:gbv:hil2-opus-2893

Letzte Aktualisierung: 14.09.2023, 08:26 MESZ

Datenpartner

Leibniz-Institut für Deutsche Sprache - Bibliothek

Original beim Datenpartner anzeigen

Objekttyp

Konferenzbeitrag

Beteiligte

Beißwenger, Michael
Lüngen, Harald
Margaretha, Eliza
Pölitz, Christian
Hildesheim : Universität Hildesheim

Entstanden

2014-11-03

Ähnliche Objekte (12)

Konferenzbeitrag

Text type structure and logical document structure

Artikel

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Monografie

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

Konferenzbeitrag

CMC Corpora in DeReKo

Buchbeitrag

New German words: detection and description

Konferenzbeitrag

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

Artikel

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Buchbeitrag

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Buchbeitrag

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Konferenzbeitrag

Text type structure and logical document structure

Artikel

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Monografie

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

Konferenzbeitrag

CMC Corpora in DeReKo

Buchbeitrag

New German words: detection and description

Konferenzbeitrag

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

Artikel

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Buchbeitrag

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Buchbeitrag

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Konferenzbeitrag

Text type structure and logical document structure

Artikel

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Monografie

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

Konferenzbeitrag

CMC Corpora in DeReKo

Buchbeitrag

New German words: detection and description

Konferenzbeitrag

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

Artikel

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Buchbeitrag

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Buchbeitrag

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Benutzerkonto anlegen

Informationen zur Registrierung von Kultur- und Wissenseinrichtungen finden Sie hier.

Felder mit * müssen ausgefüllt werden.

Benutzername*

Bitte geben Sie Ihren Benutzernamen ein

E-Mail*

Bitte geben Sie Ihre E-Mail ein

Bitte füllen Sie dieses Feld nicht aus

Vorname

Nachname

Passwort*

Bitte geben Sie Ihr Passwort ein

Passwort bestätigen*

Bitte geben Sie das gleiche Passwort ein

* Ich habe die Nutzungsbedingungen und die Datenschutzerklärung zur Erhebung persönlicher Daten gelesen und stimme ihnen zu.

Dieses Feld ist ein Pflichtfeld.

Ich möchte den Newsletter der Deutschen Digitalen Bibliothek abonnieren. Siehe Informationen zum Newsletter-Abonnement.

Benutzerkonto angelegt

Ihr „Meine DDB“-Konto wurde erfolgreich angelegt. Bevor Sie sich in Ihrem Konto anmelden können, müssen Sie auf den Bestätigungslink in der Nachricht klicken, die wir gerade an die von Ihnen angegebene E-Mail-Adresse geschickt haben

Die Kultursuchmaschine

Mining corpora of computer-mediated communication: analysis of linguistic features in Wikipedia talk pages using machine learning methods

Download

Angaben zum Objekt

Klassifikation und Themen

Beteiligte, Orts- und Zeitangaben

Weitere Informationen

Datenpartner

Objekttyp

Beteiligte

Entstanden

Ähnliche Objekte (12)

Text type structure and logical document structure

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

CMC Corpora in DeReKo

New German words: detection and description

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Text type structure and logical document structure

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

CMC Corpora in DeReKo

New German words: detection and description

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Text type structure and logical document structure

Building linguistic corpora from Wikipedia articles and discussions

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Mining corpora of computer-mediated communication: Analysis of linguistic features in Wikipedia talk pages using machine learning methods

CMC Corpora in DeReKo

New German words: detection and description

Reply relations in CMC: types and annotation

Types and annotation of reply relations in computer-mediated communication

DEREKO - Das Deutsche Referenzkorpus. Schriftkorpora der deutschen Gegenwartssprache am Institut für Deutsche Sprache in Mannheim

Integrating corpora of computer-mediated communication in CLARIN-D: Results from the curation project ChatCorpus2CLARIN

Reply relations in CMC: types and annotation

Building linguistic corpora from Wikipedia articles and discussions

Verbundene Objekte