A novel extreme learning machine based kNN classification method for dealing with big data

Research

Title	A novel extreme learning machine based kNN classification method for dealing with big data
Type	JournalPaper
Keywords	Big datakNNELMLabel matrixCorrectness matrixTree
Year	2021
Journal	EXPERT SYSTEMS WITH APPLICATIONS
DOI
Researchers	Amin Shokrzade ، Mohsen Ramezani ، Fardin Akhlaghian Tab ، Mahmud Abdulla Mohammad

Abstract

kNN algorithm, as an effective data mining technique, is always attended for supervised classification. On the other hand, the previously proposed kNN finding methods cannot be considered as efficient methods for dealing with big data. As there is daily generated and expanded big datasets on different online and offline servers, the efficient methods for such data must be introduced to find kNN. Moreover, massive amounts of data contain more noise and imperfection data samples that significantly increase the need for a robust kNN finding method. In this paper, a new fast and robust kNN finding framework is introduced to deal with the big datasets. In this method, a group of most relevant data samples to an input data sample are detected and the original kNN method is applied on them for finding the final nearest neighbors. The main goal of this method is dealing with the big datasets in an accurate, fast, and robust manner. Here, the training data samples of each label are grouped into some partitions based on the output of some mini-classifiers (i.e. ELM classifier). In fact, the behavior of the mini-classifiers is the basis of partitioning the training data samples. These mini-classifiers are trained using non-overlapping subsets of the training set in the form of each mini-classifier a subset. Here, an index is calculated for each partition to make the corresponding partition finding faster using a tree structure in which each partition index is fallen into a leaf. Then, outputs of the mini-classifiers for an input test sample are used to find the corresponding group of most relevant data samples to the input data sample on the tree. Experimental results indicate that the proposed method has better performance in most cases and comparable performance on other cases of original and noisy big data problems.

Fardin Akhlaghian Tab

Research

Abstract