Preprint
The impact of lacking metadata and data truncation for the measurement of cultural and linguistic change using the Google Ngram datasets
As a result of legal restrictions the Google Ngram Corpora datasets are a) not accompanied by any metadata regarding the texts the corpora consist of and the data are b) truncated to prevent an indirect conclusion from the n-gram to the author of the text. Some of the consequences of this strategy are discussed in this article.
- Language
-
Deutsch
- Subject
-
Sprachwandel
Kulturwandel
Sprachstatistik
Korpus <Linguistik>
Datenstruktur
Metadaten
Linguistik
- Event
-
Geistige Schöpfung
- (who)
-
Koplenig, Alexander
- Event
-
Veröffentlichung
- (who)
-
Mannheim : Institut für Deutsche Sprache
- (when)
-
2014-10-17
- URN
-
urn:nbn:de:bsz:mh39-31557
- Last update
-
06.03.2025, 9:00 AM CET
Data provider
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.
Object type
- Preprint
Associated
- Koplenig, Alexander
- Mannheim : Institut für Deutsche Sprache
Time of origin
- 2014-10-17