Compression-decompression of multivariate data via maximum entropy resampling and applications to inference
Abstract: Individual Person Data (IPD) typically consists of repeated independent observations of a multi-dimensional dependent record. Imagine collection of a multi-variate medical record (age, height, health-status, etc ...) on several unrelated patients. IPD sharing is crucial for scientific advance- ment, that is, for experimental validation, evidence pooling, and reliable statistical inferences. While IPD disclosure is feasible it is sometimes difficult or impossible. If IPD is not available researchers still try to recover original information from disclosed IPD syntheses. For instance in meta-analysis we often focus on appraisal and combination of disclosed regression slopes. This is sometimes equivalent to perform the original pooled IPD regression but generally it is not. The implicit question is how much information about the original IPD, and IPD inference, the IPD syntheses do convey. The general opinion is that non negligible information loss should occur. Here we propose a new paradigm by which appraisal of certain IPD summaries, that is IPD marginal moments and correlation matrix, seems to generally entail small information loss at both the data and inferential level. The idea is to reconstruct original IPD from the above summaries only, and to recover an original IPD inference from such reconstructed IPD. We argue this ap- proach is well founded in an information theoretic sense which seems not fully acknowledged in the literature so far. The reconstruction method is based on maximum entropy (MaxEnt ) resampling where the basic MaxEnt formalism is extended to include record dependence by the aide of copula theory. We argue the Gaussian copula with given moment-based MaxEnt marginals and correlation matrix equals the multi-variate MaxEnt distribution from which stochastic simulations of the original IPD are drawn. By an extension of the renowned Gibbs conditioning principle there are strong hints the used Gaussian copula is asymptotically equal to the true IPD generating mechanism, given summaries on its empirical distribution. We verify such claims experimentally. So far this seems one of the strongest arguments for an objective method of IPD reconstruction from IPD summaries only. Next we build a MaxEnt bootstrap estimator by using the proposed MaxEnt joint distribution as plug-in approximation for the IPD generating process, under conditions on its empirical sum- maries. We give hints of MaxEnt bootstrap consistency and argue for good predictive properties of a bootstrap average. Experimental assessments suggests the MaxEnt bootstrap does recover key features of an IPD inference distribution, or ensemble. We practically show this for commonly performed IPD inferences like Generalized Linear Models and proportional hazards Cox regression parameters, or Breslow/Nelson-Aalen type cumulative hazard, estimates. The proposed method could find natural applications in IPD anonymization, distributed net- work computing, research reproduction and synthesis (meta-analysis), where no original IPD but only key IPD summaries can be made available. This work seems to suggest a new standard for IPD summary reporting and general IPD inference recovery, by which an important limitation of IPD information loss is possible
- Standort
-
Deutsche Nationalbibliothek Frankfurt am Main
- Umfang
-
Online-Ressource
- Sprache
-
Englisch
- Anmerkungen
-
Universität Freiburg, Dissertation, 2018
- Schlagwort
-
Inference
Entropy
Meta-analysis
- Ereignis
-
Veröffentlichung
- (wo)
-
Freiburg
- (wer)
-
Universität
- (wann)
-
2018
- Urheber
- Beteiligte Personen und Organisationen
- DOI
-
10.6094/UNIFR/16498
- URN
-
urn:nbn:de:bsz:25-freidok-164981
- Rechteinformation
-
Kein Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
- Letzte Aktualisierung
-
15.08.2025, 07:37 MESZ
Datenpartner
Deutsche Nationalbibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.
Beteiligte
Entstanden
- 2018