Arbeitspapier

Robust Learning from Bites for Data Mining

Some methods from statistical machine learning and from robust statistics have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets, say with millions of data points. Secondly, robust and non-parametric confidence intervals for the predictions according to the fitted models are often unknown. Here, we propose a simple but general method to overcome these problems in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. Our main focus is on robust general support vector machines (SVM) based on minimizing regularized risks. The method offers distribution-free confidence intervals for the median of the predictions. The approach can also be helpful to fit robust estimators in parametric models for huge data sets.

Sprache
Englisch

Erschienen in
Series: Technical Report ; No. 2006,03

Thema
Breakdown point
convex risk minimization
data mining
distributed computing
influence function
logistic regression
robustness
scalability

Ereignis
Geistige Schöpfung
(wer)
Christmann, Andreas
Steinwart, Ingo
Hubert, Mia
Ereignis
Veröffentlichung
(wer)
Universität Dortmund, Sonderforschungsbereich 475 - Komplexitätsreduktion in Multivariaten Datenstrukturen
(wo)
Dortmund
(wann)
2006

Handle
Letzte Aktualisierung
20.09.2024, 08:22 MESZ

Datenpartner

Dieses Objekt wird bereitgestellt von:
ZBW - Deutsche Zentralbibliothek für Wirtschaftswissenschaften - Leibniz-Informationszentrum Wirtschaft. Bei Fragen zum Objekt wenden Sie sich bitte an den Datenpartner.

Objekttyp

  • Arbeitspapier

Beteiligte

  • Christmann, Andreas
  • Steinwart, Ingo
  • Hubert, Mia
  • Universität Dortmund, Sonderforschungsbereich 475 - Komplexitätsreduktion in Multivariaten Datenstrukturen

Entstanden

  • 2006

Ähnliche Objekte (12)