March 28 ~ 29, 2025, Virtual conference
Xiaoying Zeng and Eugene Pinsky, Department of Computer Science, Metropolitan College, Boston University 1010 Commonwealth Avenue, Boston, MA 02215 USA
This paper presents a comparative analysis of Gaussian Mixture Models (GMMs) and Ellipti- cal Mixture Models (EMMs) for clustering multi-dimensional datasets using the Expectation-Maximization (EM) algorithm. EMMs, which accommodate elliptical distributions’ covariance structures, exhibit a su- perior ability to handle complex data patterns, particularly datasets characterized by irregular shapes and heavy tails. By integrating R’s statistical tools into Python, this study enhances computational flexibility, making it easier to fit elliptical distributions. Empirical results using metrics like Weighted Average Purity, Dunn Index, Rand Index, and silhouette score show that EMMs substantially improve clustering accuracy under certain conditions, outperforming GMMs in handling data complexities common in real-world sce- narios. This research emphasizes the potential of EMMs as an alternative to traditional GMMs, offering a robust yet equally accessible approach for clustering in machine learning applications.
Gaussian Mixture Models, Elliptical Distribution Mixture Models, Expectation- Maximization algorithm, Clustering, Multidimensional Data.
Nahush Bhamre, Pranjal Prasanna Ekhande, and Eugene Pinsky, Department of Computer Science, Metropolitan College, Boston University, 1010 Commonwealth Avenue, Boston, MA 02215,USA
The Naive Bayes (NB) algorithm is widely recognized for its efficiency and simplicity in classi- fication tasks, particularly in domains with high-dimensional data. While the Gaussian Naive Bayes (GNB) model assumes a Gaussian distribution for continuous features, this assumption often limits its applica-bility to real-world datasets with non-Gaussian characteristics. To address this limitation, we introduce an enhanced Naive Bayes framework that incorporates stable distributions to model feature distributions. Stable distributions, with their flexibility in handling skewness and heavy tails, provide a more realistic representation of diverse data characteristics. This paper details the theoretical integration of stable distri-butions into the NB algorithm, the implementation process utilizing R and Python, and an experimental evaluation across multiple datasets. Results indicate that the proposed approach offers competitive or superior classification accuracy, particularly when the Gaussian assumption is violated, underscoring its potential for practical applications in diverse fields.
Machine Learning, Naive Bayes Classification, Stable Distributions.