Konferenzbeitrag

Mining corpora of computer-mediated communication: analysis of linguistic features in Wikipedia talk pages using machine learning methods

Machine learning methods offer a great potential to automatically investigate large amounts of data in the humanities. Our contribution to the workshop reports about ongoing work in the BMBF project KobRA (http://www.kobra.tu-dortmund.de) where we apply machine learning methods to the analysis of big corpora in language-focused research of computer-mediated communication (CMC). At the workshop, we will discuss first results from training a Support Vector Machine (SVM) for the classification of selected linguistic features in talk pages of the German Wikipedia corpus in DeReKo provided by the IDS Mannheim. We will investigate different representations of the data to integrate complex syntactic and semantic information for the SVM. The results shall foster both corpus-based research of CMC and the annotation of linguistic features in CMC corpora.

Mining corpora of computer-mediated communication: analysis of linguistic features in Wikipedia talk pages using machine learning methods

Urheber*in: Beißwenger, Michael; Lüngen, Harald; Margaretha, Eliza; Pölitz, Christian

Namensnennung - Weitergabe unter gleichen Bedingungen 4.0 International

0
/
0

Sprache
Englisch

Thema
Korpus <Linguistik>

Ereignis
Geistige Schöpfung
(wer)
Beißwenger, Michael
Lüngen, Harald
Margaretha, Eliza
Pölitz, Christian
(wann)
2014-11-03
Ereignis
Veröffentlichung
(wer)
Hildesheim : Universität Hildesheim

URN
urn:nbn:de:gbv:hil2-opus-2893
Letzte Aktualisierung
14.09.2023, 08:26 MESZ

Objekttyp


  • Konferenzbeitrag

Beteiligte


  • Beißwenger, Michael
  • Lüngen, Harald
  • Margaretha, Eliza
  • Pölitz, Christian
  • Hildesheim : Universität Hildesheim

Entstanden


  • 2014-11-03

Ähnliche Objekte (12)