Novel Outlier Detection and Clustering Methods Based on Cluster Catch Digraphs
| Metadata Field | Value | Language |
|---|---|---|
| dc.contributor.advisor | Ceyhan, Elvan | |
| dc.contributor.author | Shi, Rui | |
| dc.date.accessioned | 2025-11-13T16:11:44Z | |
| dc.date.available | 2025-11-13T16:11:44Z | |
| dc.date.issued | 2025-11-13 | |
| dc.identifier.uri | https://etd.auburn.edu/handle/10415/10042 | |
| dc.description.abstract | We propose a novel family of outlier detection algorithms built upon Cluster Catch Digraphs (CCDs) and their extensions, designed to overcome the limitations of existing methods in handling high-dimensional, heterogeneous, and irregularly shaped data. Our approach introduces Mutual Catch Graphs (MCGs) to enhance the discrimination between inliers and outliers by incorporating local density and geometric structure. Based on CCDs derived from Kolmogorov–Smirnov-type statistics, Ripley’s K function, and Nearest Neighbor Distances (NND), we develop a suite of algorithms—U-MCCD, UN-MCCD, and their shape-adaptive variants (SU-MCCD and SUN-MCCD)—which adaptively refine cluster boundaries and suppress false detections. These methods are largely parameter-free or require minimal tuning, making them scalable and user-friendly. We also introduce two novel, outlyingness scores (OSs) based on Cluster Catch Digraphs (CCDs): Outbound Outlyingness Score (OOS) and Inbound Outlyingness Score (IOS). These scores enhance the interpretability of outlier detection results. Both OSs employ graph-, density-, and distribution-based techniques, tailored to high-dimensional data with varying cluster shapes and intensities, and we show that iOS is robust to the masking problems. Our results show that shape-adaptive variants and the two outlyingness scores significantly improve detection accuracy, particularly in high-dimensional or non-uniform settings, where traditional graph- and density-based methods often fail. Finally, we introduce a new clustering method called UN-CCD. The new method addresses the limitations of RK-CCDs on clustering by employing a new variant of the spatial randomness test that employs the nearest neighbor distance (NND) instead of Ripley’s K function. The UN-CCD method is particularly effective for high-dimensional datasets, comparable to or outperforming KS-CCDs and RK-CCDs (rely on a KS-type statistic or Ripley’s K function) on clustering. For each method proposed, we provide theoretical guarantees for computational complexity, demonstrate robustness through extensive Monte Carlo experiments, and evaluate performance across a wide range of dimensions, cluster configurations, and contamination levels. Keywords: Outlier detection; Outlyingness scores; Graph-based clustering; Cluster catch di graphs; High-dimensional data; the nearest neighbor distance; Spatial randomness test. | en_US |
| dc.rights | EMBARGO_NOT_AUBURN | en_US |
| dc.subject | Mathematics and Statistics | en_US |
| dc.title | Novel Outlier Detection and Clustering Methods Based on Cluster Catch Digraphs | en_US |
| dc.type | PhD Dissertation | en_US |
| dc.embargo.length | MONTHS_WITHHELD:12 | en_US |
| dc.embargo.status | EMBARGOED | en_US |
| dc.embargo.enddate | 2026-11-13 | en_US |
| dc.creator.orcid | 0009-0001-2556-9741 | en_US |
