چکیده
|
Feature selection is a fundamental data preprocessing step in data mining, where its goal is removing some irrelevant and/or redundant features from a given dataset. In this paper, we present a clustering based genetic algorithm for feature selection (CGAFS). The proposed algorithm works in three steps. In the first step, Subset size is determined. In the second step, features are divided into clusters using k-means clustering algorithm. Finally, in the third step, features are selected using genetic algorithm with a new clustering based repair operation. The performance of the proposed method has been assessed on five benchmark classification problems. We also compared the performance of CGAFS with the results obtained from four existing well-known feature selection algorithms. The results show that the CGAFS produces consistently better classification accuracies.
|