Correlation and prediction of the solubility of divers set of organic compounds in water by QSPR studies based on topological descriptors using PCR and PC-ANN

Research

Title	Correlation and prediction of the solubility of divers set of organic compounds in water by QSPR studies based on topological descriptors using PCR and PC-ANN
Type	Presentation
Keywords	Solubility-QSPR-Topological Descriptors -PC-ANN
Year	2007
Researchers	Amir najafi ، Bahram Hemmateenejad ، Raouf Ghavami

Abstract

The primary goal of a quantitative structure-property relationship (QSPR) is to identify a set of structurally based numerical descriptors that can be mathematically linked to a property of interest [1,2]. Recently, we proposed some new topological indices based on the distance sum and connectivity of a molecular graph that derived directly from two-dimensional molecular topology for use in QSAR/QSPR studies [3,4] The proposed Sh indices promise to be useful parameters in the QSPR studies. In this study, develops the ability of these indices to predict the aqueous solubility (-logS) of a large set of organic compounds belonging to a diverse type of compounds. Ten different Sh indices were calculated for each molecule. Linear and nonlinear modelings were implemented using principal component regression and feed-forward artificial neural network with back-propagation learning algorithm, respectively. Principal component analysis of the Sh data matrix showed that the seven PCs could explain 99.97% of variances in the Sh data matrix. The extracted PCs were used as the predictor variables for PCR and ANN models. The ANN model could explain 97.63% of variances in the solubility data, while the value obtained from PCR procedures were 84.27%. The cross-validation set is a subset of compounds used to help find an optimal set of weights and biases during ANN training, and it is also used to avoid overtraining of the feed-forward neural network. Leave-one out cross-validation and the hold-out-a-test-sample (HOTS) procedures were used to validate the models. Models to predict the solubility is constructed using PCR and PC-ANN with errors comparables to the experimental errors of the solubility data. The root mean-square-errors (RMS-error) associated with the calibration, prediction, and validation set compounds used for the PC-ANN model were 0.314, 0.450, and 0.314 –logunits, respectively.

Raouf Ghavami

Research

Abstract