Konferenzbeitrag

Deduplication in large web corpora

Our paper tries to find answers to some questions related to deduplication process in large-scale web-crawled corpora. An experiment based on eight corpora from the Aranea family is introduced, and first results are presented.

Deduplication in large web corpora

Urheber*in: Benko, Vladimír

Attribution 4.0 International

0
/
0

Language
Englisch

Subject
Korpus <Linguistik>
Sprache

Event
Geistige Schöpfung
(who)
Benko, Vladimír
Event
Veröffentlichung
(who)
Mannheim : Leibniz-Institut für Deutsche Sprache
(when)
2019-07-04

URN
urn:nbn:de:bsz:mh39-90221
Last update
06.03.2025, 9:00 AM CET

Data provider

This object is provided by:
Leibniz-Institut für Deutsche Sprache - Bibliothek. If you have any questions about the object, please contact the data provider.

Object type

  • Konferenzbeitrag

Associated

  • Benko, Vladimír
  • Mannheim : Leibniz-Institut für Deutsche Sprache

Time of origin

  • 2019-07-04

Other Objects (12)