The importance of doing research into affective computing has multiplied with the growing popularity of intelligent and human-machine interface systems. In this paper, a system for speech emotion recognition (SER) is proposed using new techniques in different parts. The given system extracts speech features from both speech and glottal-waveform signals in feature extraction section including spectro-temporal ones obtained from Gabor filter bank (GBFB) and separate Gabor filter bank (SGBFB) which have not been so far utilized for SER. At the classification step, a hierarchical adaptive weighted multilayer extreme learning machine (H-AWELM) is employed. This hybrid classifier consists of two parts: the first part for sparse unsupervised feature learning using a multi-layer neural network (NN) with sparse extreme learning machine auto-encoder (ELMAE) layers, and the second part for feature classification in the last layer using Tikhonov’s regularized least squares (LS) technique. One of the most important issues in multi-class ELM training process is how to deal with data imbalance problem. This paper presents a new adaptive weighting method to solve this problem that can be more accurate than current weighting methods. Finally, the proposed system is evaluated on a well-known emotional speech database. Experimental results demonstrate that the proposed system outperforms the state-of-the-art ones