Online Public Access Catalogue (OPAC)
Library,Documentation and Information Science Division

“A research journal serves that narrow

borderland which separates the known from the unknown”

-P.C.Mahalanobis


Robust Matrix Factorization using the Density Power Divergence and its Applications/ Subhrajyoty Roy

By: Material type: TextTextPublication details: Kolkata: Indian Statistical Institute, 2025Description: xvi, 214 pagesSubject(s): DDC classification:
  • 23rd SA.13 R888
Online resources:
Contents:
Background -- Robust Singular Value Decomposition -- Robust Principal Component Analysis -- Rank Estimation -- Breakdown Analysis of Minimum Super Divergence Estimator -- Breakdown Analysis of Minimum Generalized Alpha-Beta Divergence Estimator -- Conclusion and Future Scopes
Production credits:
  • Guided by Prof. Ayanendranath Basu & Prof. Abhik Ghosh
Dissertation note: Thesis (Ph.D.) - Indian Statistical Institute, 2025 Summary: In the modern era of big data, high-dimensional datasets are becoming increasingly com- mon across a range of disciplines, including machine learning, natural language process- ing, finance, and genomics. Extracting meaningful information from these datasets often requires uncovering low-dimensional structures hidden within the data. Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) are widely used matrix factorization techniques for this purpose. However, the traditional methods to compute these are extremely sensitive to outliers, with even a single aberrant observation poten- tially leading to highly imprecise results. This issue is exacerbated in high-dimensional datasets, where outliers are difficult to detect. Classical robust inference techniques, such as M-estimators, struggle due to their diminishing breakdown points as the data dimension becomes extremely large. This thesis addresses these challenges by proposing a novel class of robust matrix factorization techniques based on the minimum density power divergence estimator (MD- PDE). The MDPDE, a member of the broader class of minimum divergence estimators, is well-known for its robustness and efficiency across diverse applications. Crucially, it offers a dimension-free asymptotic breakdown point, making it particularly well-suited for high- dimensional settings. In this work, we leverage this estimator to develop robust versions of SVD and PCA, referred to as rSVDdpd and rPCAdpd, respectively. The thesis is structured as follows: In Chapter 1, we provide the necessary background on classical matrix factorization techniques, introduce key concepts related to minimum divergence estimators, particularly the MDPDE, and the notations to be used through- out the thesis. Chapter 2 presents the novel rSVDdpd algorithm, detailing its theoretical properties, including different equivariance properties, algorithmic convergence and con- sistency. Through simulation studies, we demonstrate the algorithm’s superior robustness compared to existing methods, particularly in high-dimensional settings. We also ap- ply the rSVDdpd algorithm to the problem of video surveillance background modelling,showcasing its real-world applicability. Chapter 3 extends this methodology to robust PCA, resulting in the rPCAdpd al- gorithm. We establish its theoretical properties such as orthogonal equivariance, con- sistency and asymptotic normality. We also demonstrate that its influence function re- mains bounded, ensuring its robustness to outliers. Comparative studies with benchmark datasets reveal that rPCAdpd outperforms existing robust PCA algorithms, particularly in scenarios with high-dimensional data with a low signal-to-noise ratio. The robust SVD and the PCA algorithms introduced in Chapters 2 and 3 require a robust estimate of the rank of the low-dimensional component of the data matrix. To this end, we propose a new penalized criterion, DICMR, in Chapter 4. Theoretical results on selection consistency and B-robustness are established, and extensive simulation studies show that DICMR is the best-performing among penalized methods, and also provides competitive performance relative to cross-validation methods while being computationally efficient. A key contribution of this thesis, explored in Chapter 5, is the demonstration that the MDPDE has a dimension-free lower bound to its asymptotic breakdown point. This property makes it uniquely robust in high-dimensional settings, a significant improve- ment over classical M-estimators. We further generalize this result in Chapter 6, showing that the dimension-free breakdown point holds for a broader class of estimators known as minimum generalized Alpha-Beta divergence estimators. We derive the necessary and suf- ficient conditions under which the corresponding divergence measures are well-defined and nonnegative, contributing to the theoretical understanding of generating novel statistical divergence measures that may lead to robust estimation in high-dimensional data. Chapter 7 concludes the thesis, summarizing the key findings and outlining directions for future research. This includes potential extensions of the proposed algorithms to other matrix factorization problems and the exploration of more practical applications beyond those demonstrated in the thesis. Overall, this thesis aims to contribute to the field of robust statistics by developing scalable, robust matrix factorization techniques with strong theoretical guarantees and practical relevance in high-dimensional data analysis.
Tags from this library: No tags from this library for this title. Log in to add tags.
Holdings
Item type Current library Call number Status Notes Date due Barcode Item holds
THESIS ISI Library, Kolkata SA.13 R888 (Browse shelf(Opens below)) Available E-Thesis. Guided by Prof. Ayanendranath Basu & Prof. Abhik Ghosh TH646
Total holds: 0

Thesis (Ph.D.) - Indian Statistical Institute, 2025

Includes bibliography

Background -- Robust Singular Value Decomposition -- Robust Principal Component Analysis -- Rank Estimation -- Breakdown Analysis of Minimum Super Divergence Estimator -- Breakdown Analysis of Minimum
Generalized Alpha-Beta Divergence Estimator -- Conclusion and Future Scopes

Guided by Prof. Ayanendranath Basu & Prof. Abhik Ghosh

In the modern era of big data, high-dimensional datasets are becoming increasingly com- mon across a range of disciplines, including machine learning, natural language process- ing, finance, and genomics. Extracting meaningful information from these datasets often requires uncovering low-dimensional structures hidden within the data. Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) are widely used matrix factorization techniques for this purpose. However, the traditional methods to compute these are extremely sensitive to outliers, with even a single aberrant observation poten- tially leading to highly imprecise results. This issue is exacerbated in high-dimensional datasets, where outliers are difficult to detect. Classical robust inference techniques, such as M-estimators, struggle due to their diminishing breakdown points as the data dimension becomes extremely large. This thesis addresses these challenges by proposing a novel class of robust matrix factorization techniques based on the minimum density power divergence estimator (MD- PDE). The MDPDE, a member of the broader class of minimum divergence estimators, is well-known for its robustness and efficiency across diverse applications. Crucially, it offers a dimension-free asymptotic breakdown point, making it particularly well-suited for high- dimensional settings. In this work, we leverage this estimator to develop robust versions of SVD and PCA, referred to as rSVDdpd and rPCAdpd, respectively. The thesis is structured as follows: In Chapter 1, we provide the necessary background on classical matrix factorization techniques, introduce key concepts related to minimum divergence estimators, particularly the MDPDE, and the notations to be used through- out the thesis. Chapter 2 presents the novel rSVDdpd algorithm, detailing its theoretical properties, including different equivariance properties, algorithmic convergence and con- sistency. Through simulation studies, we demonstrate the algorithm’s superior robustness compared to existing methods, particularly in high-dimensional settings. We also ap- ply the rSVDdpd algorithm to the problem of video surveillance background modelling,showcasing its real-world applicability. Chapter 3 extends this methodology to robust PCA, resulting in the rPCAdpd al- gorithm. We establish its theoretical properties such as orthogonal equivariance, con- sistency and asymptotic normality. We also demonstrate that its influence function re- mains bounded, ensuring its robustness to outliers. Comparative studies with benchmark datasets reveal that rPCAdpd outperforms existing robust PCA algorithms, particularly in scenarios with high-dimensional data with a low signal-to-noise ratio. The robust SVD and the PCA algorithms introduced in Chapters 2 and 3 require a robust estimate of the rank of the low-dimensional component of the data matrix. To this end, we propose a new penalized criterion, DICMR, in Chapter 4. Theoretical results on selection consistency and B-robustness are established, and extensive simulation studies show that DICMR is the best-performing among penalized methods, and also provides competitive performance relative to cross-validation methods while being computationally efficient. A key contribution of this thesis, explored in Chapter 5, is the demonstration that the MDPDE has a dimension-free lower bound to its asymptotic breakdown point. This property makes it uniquely robust in high-dimensional settings, a significant improve- ment over classical M-estimators. We further generalize this result in Chapter 6, showing that the dimension-free breakdown point holds for a broader class of estimators known as minimum generalized Alpha-Beta divergence estimators. We derive the necessary and suf- ficient conditions under which the corresponding divergence measures are well-defined and nonnegative, contributing to the theoretical understanding of generating novel statistical divergence measures that may lead to robust estimation in high-dimensional data. Chapter 7 concludes the thesis, summarizing the key findings and outlining directions for future research. This includes potential extensions of the proposed algorithms to other matrix factorization problems and the exploration of more practical applications beyond those demonstrated in the thesis. Overall, this thesis aims to contribute to the field of robust statistics by developing scalable, robust matrix factorization techniques with strong theoretical guarantees and practical relevance in high-dimensional data analysis.

There are no comments on this title.

to post a comment.
Library, Documentation and Information Science Division, Indian Statistical Institute, 203 B T Road, Kolkata 700108, INDIA
Phone no. 91-33-2575 2100, Fax no. 91-33-2578 1412, ksatpathy@isical.ac.in