Konferenzbeitrag

Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora

The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence and word alignment across multiple languages. Additionally, linguistic annotation such as partof- speech tagging, lemmatisation, chunking, and dependency parsing facilitate precise querying of linguistic properties and can be used to extend word alignment to sub-sentential groups. Such highly interconnected data is stored in a relational database to allow for efficient retrieval and linguistic data mining, which may include the statistics-based selection of good example sentences. The varying information needs of contrastive linguists require a flexible linguistic query language for ad hoc searches. Such queries in the format of generalised treebank query languages will be automatically translated into SQL queries.

Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora

Urheber*in: Graën, Johannes; Clematide, Simon

Namensnennung - Nicht kommerziell - Keine Bearbeitungen 4.0 International

0
/
0

Sprache
Englisch

Thema
Korpus <Linguistik>
Annotation
Datenbanksystem
Linguistik

Ereignis
Geistige Schöpfung
(wer)
Graën, Johannes
Clematide, Simon
Ereignis
Veröffentlichung
(wer)
Mannheim : Institut für Deutsche Sprache
(wann)
2015-07-02

URN
urn:nbn:de:bsz:mh39-38348
Letzte Aktualisierung
06.03.2025, 09:00 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Konferenzbeitrag

Beteiligte

  • Graën, Johannes
  • Clematide, Simon
  • Mannheim : Institut für Deutsche Sprache

Entstanden

  • 2015-07-02

Ähnliche Objekte (12)