سامانه پژوهشی دانشگاه کردستان | Highly correlating distance-connectivity based topological indices 3: PCR and PC-ANN based prediction of the octanol-water partition coefficient of diverse organic molecules

عنوان	Highly correlating distance-connectivity based topological indices 3: PCR and PC-ANN based prediction of the octanol-water partition coefficient of diverse organic molecules
نوع پژوهش	مقاله چاپ‌شده در مجلات علمی
کلیدواژه‌ها	Topological indices; quantitative structure–property relationships; QSPR; principal component; principal component regression; artificial neural network; correlation ranking; partition coefficient.
چکیده	Abstract Motivation. Recently, we proposed some new topological indices (Shamsipur indices) based on the distance sum and connectivity of a molecular graph for use in QSAR/QSPR studies. The aim of this study is to examine the ability of the proposed Sh indices in QSPR study of the n–octanol/water partition coefficients (logP) of a diverse set of organic compounds by means of principal component regression (PCR) and principal component– artificial neural network (PC–ANN) modeling methods combining with two factor selection procedures named eigenvalue ranking (EV), and correlation ranking (CR). Experimental values for the partition coefficient ranging from –0.66 (methanol) to 8.16 (2,2',3,3',4,5,5',6,6'–PCB) have been collected from literature for 379 organic compounds with a wide variety of functional groups containing C, H, N, O, and all halogens. Method. Ten different Sh indices (Sh1 through Sh10) were calculated for each molecule by different combination of the connectivity and distance sum vectors. The Sh topological descriptor data matrix was subjected to principal component analysis for the reduced the dimensionality of a data set and the most significant factors or principal components (PC) were extracted. Both the linear and nonlinear modeling methods were employed for predicting the logP of an extensive set of organic compounds including several structurally diverse groups of compounds (alkanes, alkenes, alkynes, cycloalkanes, cycloalkenes, aliphatic alcohols, ethers, esters, aldehydes, ketones, carboxylic acids, amines, aromatic hydrocarbons, halogenated hydrocarbons and some polychlorinated biphenyls (PCBs)). Principal component regression and PC–ANN were used as linear and nonlinear modeling methods, respectively. Results. Principal component analysis of the Sh data matrix showed that the seven PCs could explain 99.97% of variances in the Sh data matrix. The extracted PCs were used as the predictor variables (input) for PCR and ANN (PN–ANN) models. The AN
پژوهشگران	مجتبی شمسی پور (نفر اول)، رئوف قوامی زروان (نفر دوم)، بهرام همتی نژاد (نفر سوم)، هاشم شرقی (نفر چهارم)

مشخصات پژوهش