Buchbeitrag

Creating an extensible, levelled study corpus of Russian

In this paper, we present first results of training a classifier for discriminating Russian texts into different levels of difficulty. For the classification we considered both surface-oriented features adopted from readability assessments and more linguistically informed, positional features to classify texts into two levels of difficulty. This text classification is the main focus of our Levelled Study Corpus of Russian (LeStCoR), in which we aim to build a corpus adapted for language learning purposes – selecting simpler texts for beginner second language learners and more complex texts for advanced learners. The most discriminative feature in our pilot study was a lexical feature that approximates accessibility of the vocabulary by the second language learner in terms of the proportion of familiar words in the texts. The best feature setting achieved an accuracy of 0.91 on a pilot corpus of 209 texts.

Creating an extensible, levelled study corpus of Russian

Urheber*in: Batinić, Dolores; Birzer, Sandra; Zinsmeister, Heike

In copyright

Language
Englisch

Subject
Russisch
Korpus <Linguistik>
Sprache

Event
Geistige Schöpfung
(who)
Batinić, Dolores
Birzer, Sandra
Zinsmeister, Heike
Event
Veröffentlichung
(who)
Bochum : Ruhr-Universität Bochum
(when)
2017-02-27

URN
urn:nbn:de:bsz:mh39-59235
Last update
06.03.2025, 9:00 AM CET

Data provider

This object is provided by:
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.

Object type

  • Buchbeitrag

Associated

  • Batinić, Dolores
  • Birzer, Sandra
  • Zinsmeister, Heike
  • Bochum : Ruhr-Universität Bochum

Time of origin

  • 2017-02-27

Other Objects (12)