Buchbeitrag

Towards a multilingual dictionary of discourse markers. Automatic extraction of units from parallel corpus

This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e.g., the type of DM, the equivalents in other languages, etc.), before human intervention.

Towards a multilingual dictionary of discourse markers. Automatic extraction of units from parallel corpus

Urheber*in: Renau, Irene; Nazar, Rogelio

Attribution - ShareAlike 4.0 International

Language
Englisch

Subject
Korpus <Linguistik>
Lexikographie
Elektronisches Wörterbuch
Diskursmarker
Mehrsprachiges Wörterbuch
Englisch, Altenglisch

Event
Geistige Schöpfung
(who)
Renau, Irene
Nazar, Rogelio
Event
Veröffentlichung
(who)
Mannheim : IDS-Verlag
Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)
(when)
2022-08-18

URN
urn:nbn:de:bsz:mh39-111830
Last update
06.03.2025, 9:00 AM CET

Data provider

This object is provided by:
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.

Object type

  • Buchbeitrag

Associated

  • Renau, Irene
  • Nazar, Rogelio
  • Mannheim : IDS-Verlag
  • Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Time of origin

  • 2022-08-18

Other Objects (12)