Robot Audition Group

Introduction

The group of robot audition aims to advance the research of intelligent speech processing for human machine interaction and develop effective algorithms for implementing real-world applications. The work of the group spans a broad range of related topics: from sound source localization, separation to speech enhancement, recognition. The work focus on using data-driven statistical approaches to achieve natural human-machine speech interface. Hence, accurate and rapid sound source localization, efficient audio noise reduction algorithm, robust speech recognition are the primary interest in terms of fundamental research. To satisfy the demand of mass computations we also built a platform to run parallel computations on GPUs.

Achievements

We have put forward a high performance audio compression standard under the leadership of Professor Liu Peilin, the standard has been adopted by China AVS. We have participated in setting the USAC standard in the framework of MPEG in cooperation with Huawei Technologies Co., Ltd. We have achieved sub-Nyquist sampling and accurate recovery of audio signal based on compressive sensing and published several SCI and EI papers and apply for several patents in this field. We have achieved the vector quantization and amplitude spectrum reconstruction using deep neural network and published 2 SCI papers in this field.

Projects

  • Research on MIMO Audio Signal Sensing Processing Technology Based on Compressed Perception Theory. Funded by Huawei.
  • Ultra-low bit rate speech codec. Funded by zz
  • Space audio object coding based on redundant dictionary and compression perception. Funded by NSFC.
  • Research on Speech Enhancement Technology of Specific Speakers. Funded by ZTE.

Selected Publication

  • Speech Magnitude Spectrum Reconstruction from MFCC using DNN, CHINESE J ELECTRON, in press
  • An Improved Vector Quantization Method using Deep Neural Network, AEU-INT J ELECTRON C, 2017
  • Jiang, Sumxin, Rendong Yin, and Peilin Liu. “A memory efficient finite-state source coding algorithm for audio MDCT coefficients.” EURASIP Journal on Audio, Speech, and Music Processing 2014.1 (2014): 22.
  • Jiang, Sumxin, Rendong Yin, and Peilin Liu. “Finite-state entropy-constrained vector quantiser for audio modified discrete cosine transform coefficients uniform quantisation.” IET Signal Processing 9.1 (2015): 30-36.
  • Jiang, Sumxin, et al. “Compressive Sensing of Audio Signal via Structured Shrinkage Operators.” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 97.4 (2014): 923-930.
  • Jia Xiaoli, Jiang Xiaobo, Jiang Sumxin, Liu Peilin. A Reconstruction Algorithm for Speech Compressive Sensing Using Structural Features. Journal of Shanghai Jiao Tong University (Chinese).

Selected Patent

  • A vector quantizer based on deep neural networks, 2016104665183, applied
  • A method and device for speech signals coding and decoding with an extremely low bitrate,CN201310224360.5