Elastic Deep Autoencoder for Text Embedding Clustering by an Improved Graph Regularization

Research

Title	Elastic Deep Autoencoder for Text Embedding Clustering by an Improved Graph Regularization
Type	JournalPaper
Keywords	Deep autoencoder;Text clusteringGraph; regularizationText embedding
Year	2024
Journal	Expert Systems with Applications
DOI
Researchers	Fateme Daneshfar ، Sayvan Soleymanbaigi ، Ali Nafisi ، Pedram Yamini

Abstract

Text clustering is a task for grouping extracted information of the text in different clusters, which has many applications in recommender systems, sentiment analysis, and more. Deep learning-based methods have become increasingly popular due to their high accuracy in identifying nonlinear structures. They usually consist of two major parts: dimensionality reduction and clustering. Autoencoders are simple unsupervised neural networks used for better representation of low-dimensional data and have shown good performance in dealing with non-linear features. However, while they utilize the Frobenius norm to deal well with Gaussian noise, they are sensitive to outlier data and Laplacian noise. In this paper, a deep autoencoder with an adapted elastic loss for text embedding clustering (EDA-TEC) is proposed. The elastic loss is a combination of the Frobenius norm and -norm to consider both types of noises. Additionally, to maintain the high-dimensional data geometric structure, a modified graph regularization term based on the weighted cosine similarity measure is used. EDA-TEC also improves clustering results by considering the sparsity regularization of the manifold representation data. In this jointly end-to-end deep learning model, better representation and text clustering results are achieved with high accuracy on common datasets compared to existing methods.1

Fateme Daneshfar

Research

Abstract