Ultra-low bit rate speech codec – SJTU Brain-inspired Application Technology Center

We propose an extended least-squares estimate, inverse short-time Fourier transforms magnitude (LSEISTFTM) speech reconstruction algorithm for MFCC-based low bit-rate speech coding. The proposed extended LSEISTFTM algorithm initializes speech with a specific signal rather than white noise, reconstructs voiced and unvoiced frames separately. Pitch frequency and voicing class are estimated from magnitude spectrum, which is inversed from MFCC, with Gaussian Mixture Model (GMM). The voicing classification and pitch estimation results show that the error is lower than 1% and 5.62%, respectively. The speech reconstruction results demonstrate that the proposed extended LSE-ISTFTM algorithm is more stable and converges faster than the LSE-ISTFTM algorithm. The speech coding results also show that the proposed algorithm has higher speech quality than the classic algorithm.

Figure 1. Block diagram of speech reconstruction scheme

Figure 2. PESQ scores of LSE-ISTFTM and extended LSE-ISTFTM algorithm

Figure 3. Convergence of LSE-ISTFTM and extended LSE-ISTFTM algorithm

Table 1. Objective test results of speech coding