Artikel

Confidence bands for a distribution function with merged data from multiple sources

We consider nonparametric estimation of a distribution function when data are collected from multiple overlapping data sources. Main statistical challenges include (1) heterogeneity of data sets, (2) unidentified duplicated records across data sets, and (3) dependence due to sampling without replacement from a data source. The proposed estimator is computable without identifying duplication but corrects bias from duplicated records. We show the uniform consistency of the proposed estimator over the real line and its weak convergence to a Gaussian process. Based on these asymptotic properties, we propose a simulation-based confidence band that enjoys asymptotically correct coverage probability. The finite sample performance is evaluated through a simulation study. A Wilms tumor example is provided.

Language
Englisch

Bibliographic citation
Journal: Statistics in Transition New Series ; ISSN: 2450-0291 ; Volume: 21 ; Year: 2020 ; Issue: 4 ; Pages: 144-158 ; New York: Exeley

Subject
confidence band
data integration
Gaussian process

Event
Geistige Schöpfung
(who)
Saegusa, Takumi
Event
Veröffentlichung
(who)
Exeley
(where)
New York
(when)
2020

DOI
doi:10.21307/stattrans-2020-035
Handle
Last update
10.03.2025, 11:44 AM CET

Data provider

This object is provided by:
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. If you have any questions about the object, please contact the data provider.

Object type

  • Artikel

Associated

  • Saegusa, Takumi
  • Exeley

Time of origin

  • 2020

Other Objects (12)