Artikel

High dimensional, robust, unsupervised record linkage

We develop a technique for record linkage on high dimensional data, where the two datasets may not have any common variable, and there may be no training set available. Our methodology is based on sparse, high dimensional principal components. Since large and high dimensional datasets are often prone to outliers and aberrant observations, we propose a technique for estimating robust, high dimensional principal components. We present theoretical results validating the robust, high dimensional principal component estimation steps, and justifying their use for record linkage. Some numeric results and remarks are also presented.

Sprache
Englisch

Erschienen in
Journal: Statistics in Transition New Series ; ISSN: 2450-0291 ; Volume: 21 ; Year: 2020 ; Issue: 4 ; Pages: 123-143 ; New York: Exeley

Thema
record linkage
principal components
high dimensional
robust

Ereignis
Geistige Schöpfung
(wer)
Bera, Sabyasachi
Chatterjee, Snigdhansu
Ereignis
Veröffentlichung
(wer)
Exeley
(wo)
New York
(wann)
2020

DOI
doi:10.21307/stattrans-2020-034
Handle
Letzte Aktualisierung
10.03.2025, 11:43 MEZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Artikel

Beteiligte

  • Bera, Sabyasachi
  • Chatterjee, Snigdhansu
  • Exeley

Entstanden

  • 2020

Ähnliche Objekte (12)