Konferenzbeitrag
Deduplication in large web corpora
Our paper tries to find answers to some questions related to deduplication process in large-scale web-crawled corpora. An experiment based on eight corpora from the Aranea family is introduced, and first results are presented.
- Language
-
Englisch
- Subject
-
Korpus <Linguistik>
Sprache
- Event
-
Geistige Schöpfung
- (who)
-
Benko, Vladimír
- Event
-
Veröffentlichung
- (who)
-
Mannheim : Leibniz-Institut für Deutsche Sprache
- (when)
-
2019-07-04
- URN
-
urn:nbn:de:bsz:mh39-90221
- Last update
-
06.03.2025, 9:00 AM CET
Data provider
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.
Object type
- Konferenzbeitrag
Associated
- Benko, Vladimír
- Mannheim : Leibniz-Institut für Deutsche Sprache
Time of origin
- 2019-07-04