Compression-decompression of multivariate data via maximum entropy resampling and applications to inference

Abstract: Individual Person Data (IPD) typically consists of repeated independent observations of a multi-dimensional dependent record. Imagine collection of a multi-variate medical record (age, height, health-status, etc ...) on several unrelated patients. IPD sharing is crucial for scientific advance- ment, that is, for experimental validation, evidence pooling, and reliable statistical inferences. While IPD disclosure is feasible it is sometimes difficult or impossible. If IPD is not available researchers still try to recover original information from disclosed IPD syntheses. For instance in meta-analysis we often focus on appraisal and combination of disclosed regression slopes. This is sometimes equivalent to perform the original pooled IPD regression but generally it is not. The implicit question is how much information about the original IPD, and IPD inference, the IPD syntheses do convey. The general opinion is that non negligible information loss should occur. Here we propose a new paradigm by which appraisal of certain IPD summaries, that is IPD marginal moments and correlation matrix, seems to generally entail small information loss at both the data and inferential level. The idea is to reconstruct original IPD from the above summaries only, and to recover an original IPD inference from such reconstructed IPD. We argue this ap- proach is well founded in an information theoretic sense which seems not fully acknowledged in the literature so far. The reconstruction method is based on maximum entropy (MaxEnt ) resampling where the basic MaxEnt formalism is extended to include record dependence by the aide of copula theory. We argue the Gaussian copula with given moment-based MaxEnt marginals and correlation matrix equals the multi-variate MaxEnt distribution from which stochastic simulations of the original IPD are drawn. By an extension of the renowned Gibbs conditioning principle there are strong hints the used Gaussian copula is asymptotically equal to the true IPD generating mechanism, given summaries on its empirical distribution. We verify such claims experimentally. So far this seems one of the strongest arguments for an objective method of IPD reconstruction from IPD summaries only. Next we build a MaxEnt bootstrap estimator by using the proposed MaxEnt joint distribution as plug-in approximation for the IPD generating process, under conditions on its empirical sum- maries. We give hints of MaxEnt bootstrap consistency and argue for good predictive properties of a bootstrap average. Experimental assessments suggests the MaxEnt bootstrap does recover key features of an IPD inference distribution, or ensemble. We practically show this for commonly performed IPD inferences like Generalized Linear Models and proportional hazards Cox regression parameters, or Breslow/Nelson-Aalen type cumulative hazard, estimates. The proposed method could find natural applications in IPD anonymization, distributed net- work computing, research reproduction and synthesis (meta-analysis), where no original IPD but only key IPD summaries can be made available. This work seems to suggest a new standard for IPD summary reporting and general IPD inference recovery, by which an important limitation of IPD information loss is possible

Standort
Deutsche Nationalbibliothek Frankfurt am Main
Umfang
Online-Ressource
Sprache
Englisch
Anmerkungen
Universität Freiburg, Dissertation, 2018

Schlagwort
Inference
Entropy
Meta-analysis

Ereignis
Veröffentlichung
(wo)
Freiburg
(wer)
Universität
(wann)
2018
Urheber
Beteiligte Personen und Organisationen

DOI
10.6094/UNIFR/16498
URN
urn:nbn:de:bsz:25-freidok-164981
Rechteinformation
Kein Open Access; Der Zugriff auf das Objekt ist unbeschränkt möglich.
Letzte Aktualisierung
15.08.2025, 07:37 MESZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
Deutsche Nationalbibliothek. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Entstanden

  • 2018

Ähnliche Objekte (12)