Keywords
|
Keywords: Entisols and Inceptisols, Iran, machine learning, soil classes, land surface parameters; legacy soil information; data reduction, random forest, uniform manifold approximation and projection
|
Abstract
|
Abstract: In response to the demand for spatial information on the soil to support the sustainable management of soil resources, this study applies a digital soil mapping approach to predict soil classes for a 7000 ha area, located in Kurdistan province, Iran. Based on a stratified random sampling design, 91 soil profiles were situated, described, and classified into soil great groups. Environmental covariates used for modeling soil classes included terrain derivatives, remote sensing data, distance-based rasters, and legacy geospatial information (e.g., geological map). To address the issue of data multi-collinearity amongst the predictors, three dimensionality reduction techniques were tested: the principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE), and the novel Uniform Manifold Approximation and Projection (UMAP). An initial suite of 160 environmental covariates was reduced to 10 for all the methods and used to train a Random Forest (RF) model. The most effective model coupled UMAP with the Random Forest (RF-UMAP) machine-learner, which yielded a kappa index and overall accuracy values of 0.73 and 0.80, respectively. Within Kurdistan, topography and parent material were the main soil-forming factors influencing the prediction of the soil classes. Overall, the use of UMAP outperformed PCA and t-SNE. This study demonstrates the value of using advanced dimension reduction methods to facilitate the handling of non-linear relationships among predictor variables when using RF.
|