A mutual information (MI) is a measure of dependency between two variables. Its use is not restricted to linear model in contrast to the coefficient of determination measure, . The MI can be applied to nonlinear model as well. Most studies which use MI as a dependency measure only focused on a single distribution of random variables. In practice we may encounter with mixture distributions which incline to produce two distant groups in the data, particularly in some areas such as clustering analysis and data mining. In this situation, by considering the estimates of MI on a single distribution instead of the mixture distributions may produce less efficient and misleading results. Several methods have been proposed for the MI estimation, such as kernel density estimators (KDE) (Moon et al., 1995), k-nearest neighbors (KNN) (Kraskov et al., 2004), Edgeworth approximation of differential entropy (Hulle, 2005), MI carried by the rank sequences (Wang et al., 2005), and adaptive partitioning of the XY plane (Cellucci et al., 2005). KDE is a well known method which is used extensively to estimate the MI. Accurate estimation of MI depends heavily on the precise estimate of density functions. This paper will focus only on mixture of two bivariate normal distributions. The smaller of the two groups in mixture distribution can be considered as outliers. Outliers are observations which are markedly different from the bulk of the data or from the pattern set by the majority of the observations. Moon (1995) stated that the ordinary KDE is not suitable in mixture distributions. This may be due to the composition of the observations in which their observations were usually separated into two groups with different locations and scales. In this situation, the computation of the covariance matrix of the probability function will be affected. It is now evident that the classical mean and classical standard deviation are easily affected by outliers. Outliers are the leading cause of bias