Konferenzbeitrag

Count-based and predictive language models for exploring DeReKo

We present the use of count-based and predictive language models for exploring language use in the German Reference Corpus DeReKo. For collocation analysis along the syntagmatic axis we employ traditional association measures based on co-occurrence counts as well as predictive association measures derived from the output weights of skipgram word embeddings. For inspecting the semantic neighbourhood of words along the paradigmatic axis we visualize the high dimensional word embeddings in two dimensions using t-stochastic neighbourhood embeddings. Together, these visualizations provide a complementary, explorative approach to analysing very large corpora in addition to corpus querying. Moreover, we discuss count-based and predictive models w.r.t. scalability and maintainability in very large corpora.

Count-based and predictive language models for exploring DeReKo

Urheber*in: Fankhauser, Peter; Kupietz, Marc

Namensnennung - Nicht kommerziell 4.0 International

0
/
0

Sprache
Englisch

Thema
Korpus <Linguistik>
Deutsch
Kollokation
Syntagma
Assoziationsmaß
Paradigma
Sprache

Ereignis
Geistige Schöpfung
(wer)
Fankhauser, Peter
Kupietz, Marc
Ereignis
Veröffentlichung
(wer)
Paris : European Language Resources Association (ELRA)
Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)
(wann)
2022-07-01

URN
urn:nbn:de:bsz:mh39-111107
Letzte Aktualisierung
06.03.2025, 09:00 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Leibniz-Institut für Deutsche Sprache - Bibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Konferenzbeitrag

Beteiligte

  • Fankhauser, Peter
  • Kupietz, Marc
  • Paris : European Language Resources Association (ELRA)
  • Mannheim : Leibniz-Institut für Deutsche Sprache (IDS)

Entstanden

  • 2022-07-01

Ähnliche Objekte (12)