Speaker
Description
Classification performance is typically assessed empirically, yet its dependence on intrinsic data characteristics is not fully understood. In this study, we examine classification difficulty through two complementary dimensions: local class ambiguity measured in terms of instance hardness and global class separability captured by Silhouette score.
We generated synthetic datasets based on the make_blobs framework, where data complexity is controlled via cluster structure and dispersion. Instance hardness is quantified using 1-nearest-neighbor leave-one-out (LOOCV) error, corresponding to the N3 data complexity measure and capturing local overlap. In parallel, the Silhouette score—originating from cluster analysis—is used to quantify class separation and is interpreted as a proxy for global classification difficulty.
The individual and combined effects of local class ambiguity and global class separability on the accuracy of several classifiers have been investigated. Results show that local class ambiguity has a consistent negative effect across most classifiers, while global class separability has a positive but model-dependent effect. The interaction between local class ambiguity and global class separability is significant for several classifiers, indicating that performance depends on their interplay.
We identify distinct patterns: some classifiers (e.g., kNN, decision trees) are mainly sensitive to local class ambiguity, others (e.g., logistic regression) to global class separability, while hybrid models (e.g., SVM and boosting methods) depend on both.
| Classification | Both methodology and application |
|---|---|
| Keywords | Classification difficulty, Local class ambiguity, Class separability |