Arbeitspapier
O desafio do pareamento de grandes bases de dados: Mapeamento de métodos de record linkage probabilístico e diagnóstico de sua viabilidade empírica
This paper verified the predictive performance of probabilistic record linkage algorithms for the integration big sized real databases, evaluating the effects of the blocking key definition, as well as string metric functions and phonetic code pairing algorithms with respect to the prediction's quality and computational complexity. A bibliographical survey of the main deterministic and probabilistic record linkage methods was carried out, as well as of recent advances combining machine learning techniques and main packages and implementations available in open-source R language. The results can provide heuristics for problems of administrative records integration at national level and have potential value for the formulation and evaluation of public policies
- Language
-
Portugiesisch
- Bibliographic citation
-
Series: Texto para Discussão ; No. 2420
- Classification
-
Wirtschaft
Model Evaluation, Validation, and Selection
Large Data Sets: Modeling and Analysis
Miscellaneous Mathematical Tools
Data Collection and Data Estimation Methodology; Computer Programs: General
Data Collection and Data Estimation Methodology; Computer Programs: Other Computer Software
- Subject
-
pairs linking
blocking
administrative records
Big Data
- Event
-
Geistige Schöpfung
- (who)
-
Peng, Yaohao
Mation, Lucas Ferreira
- Event
-
Veröffentlichung
- (who)
-
Instituto de Pesquisa Econômica Aplicada (IPEA)
- (where)
-
Brasília
- (when)
-
2018
- Handle
- Last update
-
10.03.2025, 11:43 AM CET
Data provider
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.
Object type
- Arbeitspapier
Associated
- Peng, Yaohao
- Mation, Lucas Ferreira
- Instituto de Pesquisa Econômica Aplicada (IPEA)
Time of origin
- 2018