The scheduling of tasks in the cloud computing world has several basic issues, such as optimization goals that conflict, the rate of convergence of reinforcement algorithms is slow, trapping in local minima, and not testing on actual, large-scale data. A new two-layer model is introduced in this paper that deals simultaneously with these issues, and the model is built based on adaptive feedback. The developed architecture combines the ideas of deep reinforcement learning by using metaheuristic optimization algorithms. The strategic layer is equipped with a deep Q-network of a three-layer fully connected structure to learn the optimal scheduling policies, and the tactical layer that is used to perform the local improvement of the solutions after genetics algorithm is combined with a whale optimization algorithm. The adaptive feedback mechanism enhances the quality of the final policy and enhances the speed at which the learning process is made to work, since the solutions that have been optimized are sent back to the experience replay memory. The proposed method is evaluated based on the Google Cluster Traces 2019 dataset that consists of 405,894 records, each having 21 features, and relates to 100 virtual machines. The comparison outcome with four classical reference algorithms indicates that the suggested architecture results in a 23 % decrease in the level of service agreement (SLA) violation and a 9 % decrease in the operating cost and energy usage, alongside a convergence in episode 650 compared to traditional methods, which takes over 1000 episodes. These findings have shown that hybrid architectures with adaptive feedback can optimize conflicting goals at the same time. Real data validation indicates that this method can be deployed in a commercial data center and can fulfill a useful purpose of minimizing energy use and negative effects on the environment.