Highly correlating distance-connectivity based topological indices 3: PCR and PC-ANN based prediction of the octanol-water partition coefficient of diverse organic molecules

Research

Title	Highly correlating distance-connectivity based topological indices 3: PCR and PC-ANN based prediction of the octanol-water partition coefficient of diverse organic molecules
Type	JournalPaper
Keywords	Topological indices; quantitative structure–property relationships; QSPR; principal component; principal component regression; artificial neural network; correlation ranking; partition coefficient.
Year	2004
Journal	Internet Electronic Journal of Molecular Design
DOI
Researchers	Mojtaba Shamsipur ، Raouf Ghavami ، Bahram Hemmateenejad ، Hashem Sharghi

Abstract

Abstract Motivation. Recently, we proposed some new topological indices (Shamsipur indices) based on the distance sum and connectivity of a molecular graph for use in QSAR/QSPR studies. The aim of this study is to examine the ability of the proposed Sh indices in QSPR study of the n–octanol/water partition coefficients (logP) of a diverse set of organic compounds by means of principal component regression (PCR) and principal component– artificial neural network (PC–ANN) modeling methods combining with two factor selection procedures named eigenvalue ranking (EV), and correlation ranking (CR). Experimental values for the partition coefficient ranging from –0.66 (methanol) to 8.16 (2,2',3,3',4,5,5',6,6'–PCB) have been collected from literature for 379 organic compounds with a wide variety of functional groups containing C, H, N, O, and all halogens. Method. Ten different Sh indices (Sh1 through Sh10) were calculated for each molecule by different combination of the connectivity and distance sum vectors. The Sh topological descriptor data matrix was subjected to principal component analysis for the reduced the dimensionality of a data set and the most significant factors or principal components (PC) were extracted. Both the linear and nonlinear modeling methods were employed for predicting the logP of an extensive set of organic compounds including several structurally diverse groups of compounds (alkanes, alkenes, alkynes, cycloalkanes, cycloalkenes, aliphatic alcohols, ethers, esters, aldehydes, ketones, carboxylic acids, amines, aromatic hydrocarbons, halogenated hydrocarbons and some polychlorinated biphenyls (PCBs)). Principal component regression and PC–ANN were used as linear and nonlinear modeling methods, respectively. Results. Principal component analysis of the Sh data matrix showed that the seven PCs could explain 99.97% of variances in the Sh data matrix. The extracted PCs were used as the predictor variables (input) for PCR and ANN (PN–ANN) models. The AN

Raouf Ghavami

Research

Abstract