Artikel
High dimensional, robust, unsupervised record linkage
We develop a technique for record linkage on high dimensional data, where the two datasets may not have any common variable, and there may be no training set available. Our methodology is based on sparse, high dimensional principal components. Since large and high dimensional datasets are often prone to outliers and aberrant observations, we propose a technique for estimating robust, high dimensional principal components. We present theoretical results validating the robust, high dimensional principal component estimation steps, and justifying their use for record linkage. Some numeric results and remarks are also presented.
- Language
-
Englisch
- Bibliographic citation
-
Journal: Statistics in Transition New Series ; ISSN: 2450-0291 ; Volume: 21 ; Year: 2020 ; Issue: 4 ; Pages: 123-143 ; New York: Exeley
- Subject
-
record linkage
principal components
high dimensional
robust
- Event
-
Geistige Schöpfung
- (who)
-
Bera, Sabyasachi
Chatterjee, Snigdhansu
- Event
-
Veröffentlichung
- (who)
-
Exeley
- (where)
-
New York
- (when)
-
2020
- DOI
-
doi:10.21307/stattrans-2020-034
- Handle
- Last update
-
10.03.2025, 11:43 AM CET
Data provider
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.
Object type
- Artikel
Associated
- Bera, Sabyasachi
- Chatterjee, Snigdhansu
- Exeley
Time of origin
- 2020