Recognizing emotions in Metaverse and the real world at the same time is a significant issue that is less addressed today and is very attractive to many psychologists. In this paper, using a simple machine learning (ML) network called echo state network (ESN), speech emotion recognition (SER) has been done in both the Metaverse and the real world. Due to the recursive structure of the reservoir used in ESNs, they will have limitations for modeling high-dimensional data (because of high memory consumption). In this paper, a new structure is presented to empower ESNs for high-dimensional signal processing. To reduce the complexity caused by hyper-complex data, an octonion-based nonlinear ESN (ONESN) has been proposed. In addition, two different scenarios are designed and analyzed for how to demonstrate the functionality of the proposed networks.