2024 : 11 : 21

sayed mohammad taher Hossaini

Academic rank: Instructor
ORCID:
Education: MSc.
ScopusId: 268
HIndex:
Faculty: Faculty of Agriculture
Address:
Phone: 087 33669095

Research

Title
High-performance soil class delineation via UMAP coupled with machine learning in Kurdistan Province, Iran
Type
JournalPaper
Keywords
Keywords: Entisols and Inceptisols, Iran, machine learning, soil classes, land surface parameters; legacy soil information; data reduction, random forest, uniform manifold approximation and projection
Year
2024
Journal Geoderma Regional
DOI
Researchers Roholah taghizade ، Kamal Nabiollahi ، sayed mohammad taher Hossaini ، Nafiseh Kakhani ، Maryam Ghebleh-Goydaragh ، Thomas Scholten ، Brandon Heung ، Shadi Amirian-Chakan

Abstract

Abstract: In response to the demand for spatial information on the soil to support the sustainable management of soil resources, this study applies a digital soil mapping approach to predict soil classes for a 7000 ha area, located in Kurdistan province, Iran. Based on a stratified random sampling design, 91 soil profiles were situated, described, and classified into soil great groups. Environmental covariates used for modeling soil classes included terrain derivatives, remote sensing data, distance-based rasters, and legacy geospatial information (e.g., geological map). To address the issue of data multi-collinearity amongst the predictors, three dimensionality reduction techniques were tested: the principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and the novel Uniform Manifold Approximation and Projection (UMAP). An initial suite of 160 environmental covariates was reduced to 10 for all the methods and used to train a Random Forest (RF) model. The most effective model coupled UMAP with the Random Forest (RF-UMAP) machine-learner, which yielded a kappa index and overall accuracy values of 0.73 and 0.80, respectively. Within Kurdistan, topography and parent material were the main soil-forming factors influencing the prediction of the soil classes. Overall, the use of UMAP outperformed PCA and t-SNE. This study demonstrates the value of using advanced dimension reduction methods to facilitate the handling of non-linear relationships among predictor variables when using RF.