Konferenzbeitrag

Addressing Cha(lle)nges in Long-Term Archiving of Large Corpora

This paper addresses long-term archival for large corpora. Three aspects specific to language resources are focused, namely (1) the removal of resources for legal reasons, (2) versioning of (unchanged) objects in constantly growing resources, especially where objects can be part of multiple releases but also part of different collections, and (3) the conversion of data to new formats for digital preservation. It is motivated why language resources may have to be changed, and why formats may need to be converted. As a solution, the use of an intermediate proxy object called a signpost is suggested. The approach will be exemplified with respect to the corpora of the Leibniz Institute for the German Language in Mannheim, namely the German Reference Corpus (DeReKo) and the Archive for Spoken German (AGD).

Addressing Cha(lle)nges in Long-Term Archiving of Large Corpora

Urheber*in: Arnold, Denis; Fisseni, Bernhard; Kamocki, Paweł; Schonefeld, Oliver; Kupietz, Marc; Schmidt, Thomas

Namensnennung - Nicht kommerziell 4.0 International

0
/
0

Sprache
Englisch

Thema
Korpus <Linguistik>
Langzeitarchivierung
Nutzungsrecht
Dateiformat
Sprache

Ereignis
Geistige Schöpfung
(wer)
Arnold, Denis
Fisseni, Bernhard
Kamocki, Paweł
Schonefeld, Oliver
Kupietz, Marc
Schmidt, Thomas
Ereignis
Veröffentlichung
(wer)
Paris : European Language Resources Association
(wann)
2020-05-12

URN
urn:nbn:de:bsz:mh39-98129
Letzte Aktualisierung
06.03.2025, 09:00 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Konferenzbeitrag

Beteiligte

  • Arnold, Denis
  • Fisseni, Bernhard
  • Kamocki, Paweł
  • Schonefeld, Oliver
  • Kupietz, Marc
  • Schmidt, Thomas
  • Paris : European Language Resources Association

Entstanden

  • 2020-05-12

Ähnliche Objekte (12)