On improvement of speech intelligibility and qualityA survey of unsupervised single channel speech enhancement algorithms

  1. Nasir Saleem 1
  2. Muhammad Irfan Khattak 1
  3. Elena Verdú 2
  1. 1 University of Engineering & Technology, Peshawar (Pakistan)
  2. 2 Universidad Internacional de La Rioja
    info

    Universidad Internacional de La Rioja

    Logroño, España

    ROR https://ror.org/029gnnp81

Revista:
IJIMAI

ISSN: 1989-1660

Año de publicación: 2020

Volumen: 6

Número: 2

Páginas: 78-89

Tipo: Artículo

DOI: 10.9781/IJIMAI.2019.12.001 DIALNET GOOGLE SCHOLAR lock_openDialnet editor

Otras publicaciones en: IJIMAI

Objetivos de desarrollo sostenible

Resumen

Many forms of human communication exist; for instance, text and nonverbal based. Speech is, however, the most powerful and dexterous form for the humans. Speech signals enable humans to communicate and this usefulness of the speech signals has led to a variety of speech processing applications. Successful use of these applications is, however, significantly aggravated in presence of the background noise distortions. These noise signals overlap and mask the target speech signals. To deal with these overlapping background noise distortions, a speech enhancement algorithm at front end is crucial in order to make noisy speech intelligible and pleasant. Speech enhancement has become a very important research and engineering problem for the last couple of decades. In this paper, we present an all-inclusive survey on unsupervised single-channel speech enhancement (U-SCSE) algorithms. A taxonomy based review of the U-SCSE algorithms is presented and the associated studies regarding improving the intelligibility and quality are outlined. The studies on the speech enhancement algorithms in unsupervised perspective are presented. Objective experiments have been performed to evaluate the potential of the U-SCSE algorithms in terms of improving the speech intelligibility and quality. It is found that unsupervised speech enhancement improves the speech quality but the speech intelligibility improvement is deprived. To finish, several research problems are identified that require further research.

Referencias bibliográficas

  • Y. Ephraim and D. Malah, “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator,” IEEE Transactions on acoustics, speech, and signal processing, vol. 32, no. 6, pp. 1109-1121, 1984.
  • E. Yariv and D. Malah, “Speech enhancement using a minimum meansquare error log-spectral amplitude estimator,” IEEE transactions on acoustics, speech, and signal processing, vol. 33, no. 2, pp. 443-445, 1985.
  • E. Yariv and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on speech and audio processing, vol. 3, no. 4, pp. 251-266, 1995.
  • I. Cohen and B. Berdugo, “Noise estimation by minima controlled recursive averaging for robust speech enhancement,” IEEE signal processing letters, vol. 9, no.1, pp. 12-15, 2002.
  • I. Cohen and B. Berdugo, “Speech enhancement for non-stationary noise environments,” Signal processing, vol. 81, no. 11, pp. 2403-2418, 2001.
  • N. Virag, “Single channel speech enhancement based on masking properties of the human auditory system,” IEEE Transactions on speech and audio processing, vol. 7, no. 2, pp. 126-137, 1999.
  • N. Saleem, M. I. Khattak, G. Witjaksono, and G. Ahmad, “Variance based time-frequency mask estimation for unsupervised speech enhancement,” Multimedia Tools and Applications, 1-25, 2019.
  • M. Bahoura and J. Rouat, “Wavelet speech enhancement based on the teager energy operator,” IEEE signal processing letters, vol. 8, no. 1, pp. 10-12, 2001.
  • T. Lotter and P. Vary, “Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model,” EURASIP Journal on Advances in Signal Processing, vol. 2005, no. 7, pp. 354850, 2005.
  • A. Rezayee and S. Gazor, “An adaptive KLT approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 2, pp. 87-95, 2001.
  • D. E. Tsoukalas, J. N. Mourjopoulos, and G. Kokkinakis, “Speech enhancement based on audible noise suppression,” IEEE Transactions on Speech and Audio Processing, vol. 5, no. 6, pp. 497-514, 1997.
  • R. Martin, “Speech enhancement based on minimum mean-square error estimation and supergaussian priors,” IEEE Transactions on Speech and Audio Processing, vol. 13, no. 5, pp. 845-856, 2005.
  • C. Plapous, C. Marro, and P. Scalart, “Improved signal-to-noise ratio estimation for speech enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6, pp. 2098-2108, 2006.
  • I. Cohen, “Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator,” IEEE Signal Processing Letters, vol. 9, no. 4, pp. 113-116, 2002.
  • Y. Hu and P.C. Loizou, “Speech enhancement based on wavelet thresholding the multitaper spectrum,” IEEE Transactions on Speech and Audio processing, vol. 12, no. 1, pp. 59-67, 2004.
  • J. H. Hansen and M. A. Clements, “Constrained iterative speech enhancement with application to speech recognition,” IEEE Transactions on Signal Processing, vol. 39, no. 4, pp. 795-805, 1991.
  • S. Watanabe, M. Delcroix, F. Metze, and J.R. Hershey, Eds., New era for robust speech recognition: exploiting deep learning, Springer, 2017.
  • N. Saleem and T. G. Tareen, “Spectral Restoration based speech enhancement for robust speaker identification,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 5, no. 1, pp. 34-39, 2018.
  • J. Lazar, J. H. Feng, and H. Hochheiser, Research methods in humancomputer interaction, Morgan Kaufmann, 2017.
  • N. Saleem, E. Mustafa, A. Nawaz, and A. Khan, “Ideal binary masking for reducing convolutive noise,” International Journal of Speech Technology, vol. 18, no. 4, pp. 547-554, 2015.
  • P. Vary and R. Martin, Digital speech transmission: Enhancement, coding and error concealment, John Wiley & Sons, 2006.
  • P. C. Loizou, Speech enhancement: theory and practice, CRC press, 2007.
  • D. G. Jamieson, R. L. Brennan, and L. E. Cornelisse, “Evaluation of a speech enhancement strategy with normal-hearing and hearing-impaired listeners,” Ear and hearing, vol. 16, no. 3, pp. 274-286, 1995.
  • B. C. Moore, “Speech processing for the hearing-impaired: successes, failures, and implications for speech mechanisms,” Speech communication, vol. 41, no. 1, pp. 81-91, 2003.
  • K. H. Arehart, J. H. Hansen, S, Gallant, and L. Kalstein, “Evaluation of an auditory masked threshold noise suppression algorithm in normal-hearing and hearing-impaired listeners,” Speech Communication, vol. 40, no. 4, pp. 575-592, 2003.
  • Y. Hu and P. C. Loizou, “A comparative intelligibility study of speech enhancement algorithms,” In 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP’07, vol. 4, IEEE, 2007, pp. IV-561.
  • Y. Hu and P. C. Loizou, “A comparative intelligibility study of singlemicrophone noise reduction algorithms,” The Journal of the Acoustical Society of America, vol. 122, no. 3, pp. 1777-1786, 2007.
  • S. Gordon‐Salant, “Effects of acoustic modification on consonant recognition by elderly hearing‐impaired subjects,” The Journal of the Acoustical Society of America, vol. 81, no. 4, pp. 1199-1202, 1987.
  • J. B. Allen, “How do humans process and recognize speech?,” IEEE Transactions on speech and audio processing, vol. 2, no. 4, pp. 567-577, 1994.
  • H. Levitt, “Noise reduction in hearing aids: A review,” Journal of rehabilitation research and development,” vol. 38, no. 1, pp. 111-122, 2001.
  • G. Kim, Y. Lu, Y. Hu, and P. C. Loizou, “An algorithm that improves speech intelligibility in noise for normal-hearing listeners,” The Journal of the Acoustical Society of America, vol. 126, no. 3, pp. 1486-1494, 2009.
  • G. Kim and P.C. Loizou, “Improving speech intelligibility in noise using environment-optimized algorithms,” IEEE transactions on audio, speech, and language processing, vol. 18, no. 8, pp. 2080-2090, 2010. - 88 - International Journal of Interactive Multimedia and Artificial Intelligence, Vol. 6, Nº 2
  • S. Boll “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Transactions on acoustics, speech, and signal processing, vol. 27, no. 2, pp. 113-120, 1979.
  • Y. Lu and P. C. Loizou, “A geometric approach to spectral subtraction,” Speech communication, vol. 50, no. 6, pp.453-466, 2008.
  • S. Nasir, A. Sher, K. Usman, and U. Farman, “Speech enhancement with geometric advent of spectral subtraction using connected time-frequency regions noise estimation,” Research Journal of Applied Sciences, Engineering and Technology, vol. 6, no. 6, pp. 1081-1087, 2013.
  • K. Paliwal, K. Wójcicki, and B. Schwerin, “Single-channel speech enhancement using spectral subtraction in the short-time modulation domain,” Speech Communication, vol. 52, no. 5, pp. 450-475, 2010.
  • T. Inoue, H. Saruwatari, Y. Takahashi, K. Shikano, and K. Kondo, “Theoretical analysis of musical noise in generalized spectral subtraction based on higher order statistics,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, pp. 1770-1779, 2010.
  • Y. Zhang and Y. Zhao, “Real and imaginary modulation spectral subtraction for speech enhancement,” Speech Communication, vol. 55, no. 4, pp. 509-522, 2013.
  • R. Miyazaki, H. Saruwatari, T. Inoue, Y. Takahashi, K. Shikano, and K. Kondo, “Musical-noise-free speech enhancement based on optimized iterative spectral subtraction,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 7, pp. 2080-2094, 2012.
  • A. L. Ramos, S. Holm, S. Gudvangen, and R. Otterlei, “A spectral subtraction based algorithm for real-time noise cancellation with application to gunshot acoustics,” International Journal of Electronics and Telecommunications, vol. 59, no. 1, pp. 93-98, 2013.
  • S. M. Ban and H. S. Kim, “Weight-Space Viterbi Decoding Based Spectral Subtraction for Reverberant Speech Recognition,” IEEE Signal Processing Letters, vol. 22, no. 9, pp. 1424-1428, 2015.
  • K. Hu and D. Wang, “Unvoiced speech segregation from nonspeech interference via CASA and spectral subtraction,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, pp. 1600-1609, 2010.
  • K. Kokkinakis, C. Runge, Q. Tahmina, and Y. Hu, “Evaluation of a spectral subtraction strategy to suppress reverberant energy in cochlear implant devices,” The Journal of the Acoustical Society of America, vol. 138, no. 1, pp. 115-124, 2015.
  • H. T. Hu and C. Yu, “Adaptive noise spectral estimation for spectral subtraction speech enhancement,” IET Signal Processing, vol. 1, no. 3, pp. 156-163, 2007.
  • J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proceedings of the IEEE, vol. 67, no. 12, pp. 1586-1604, 1979.
  • H. Ding, Y. Soon, S. N. Koh, and C. K. Yeo, “A spectral filtering method based on hybrid wiener filters for speech enhancement,” Speech Communication, vol. 51, no. 3, pp. 259-267, 2009.
  • M. J. Alam and D. O’Shaughnessy, “Perceptual improvement of Wiener filtering employing a post-filter,” Digital Signal Processing, vol. 21, no. 1, pp. 54-65, 2011.
  • I. Almajai and B. Milner, “Visually derived wiener filters for speech enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 6, pp. 1642-1651, 2010.
  • M. A. A. El-Fattah, M. I. Dessouky, A. M. Abbas, S. M. Diab, E. S. M. El-Rabaie, W. Al-Nuaimy, ... and F. E. A. El-Samie, “Speech enhancement with an adaptive Wiener filter,” International Journal of Speech Technology, vol. 17, no. 1, pp. 53-64, 2014.
  • B. Xia and C. Bao, “Wiener filtering based speech enhancement with weighted denoising auto-encoder and noise classification,” Speech Communication, vol. 60, pp. 13-29, 2014.
  • K. T. Andersen and M. Moonen, “Robust speech-distortion weighted interframe Wiener filters for single-channel noise reduction,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 1, pp. 97-107, 2017.
  • B. M. Mahmmod, A. R. Ramli, S. H. Abdulhussian, S. A. R. Al-Haddad, and W. A. Jassim, “Low-distortion MMSE speech enhancement estimator based on Laplacian prior,” IEEE Access, vol. 5, pp. 9866-9881, 2017.
  • R. K. Kandagatla and P. V. Subbaiah, “Speech enhancement using MMSE estimation of amplitude and complex speech spectral coefficients under phase-uncertainty,” Speech Communication, vol. 96, pp. 10-27, 2018.
  • H. R. Abutalebi and M. Rashidinejad, “Speech enhancement based on β-order MMSE estimation of Short Time Spectral Amplitude and Laplacian speech modeling,” Speech Communication, vol. 67, pp. 92-101, 2015.
  • T. Gerkmann and M. Krawczyk, “MMSE-optimal spectral amplitude estimation given the STFT-phase,”•IEEE Signal Processing Letters, vol. 20, no. 2, pp. 129-132, 2012.
  • M. McCallum and B. Guillemin, “Stochastic-deterministic MMSE STFT speech enhancement with general a priori information,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 7, pp. 1445-1457, 2013.
  • Y. Ephraim and H. L. Van Trees, “A signal subspace approach for speech enhancement,” IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, 1995.
  • K. Hermus and P. Wambacq, “A review of signal subspace speech enhancement and its application to noise robust speech recognition,” EURASIP Journal on Advances in Signal Processing, vol. 2007, pp. 045821, 2006.
  • A. Borowicz and A. Petrovsky, “Signal subspace approach for psychoacoustically motivated speech enhancement,” Speech Communication, vol. 53, no. 2, pp. 210-219, 2011.
  • M. Kalantari, S. R. Gooran, and H. R. Kanan, “Improved embedded prewhitening subspace approach for enhancing speech contaminated by colored noise,” Speech Communication, vol. 99, pp. 12-26, 2018.
  • E. V. de Payer, “The subspace approach as a first stage in speech enhancement,” IEEE Latin America Transactions, vol. 9, no. 5, pp. 721- 725, 2011.
  • B. Wiem, P. Mowlaee, and B. Aicha, “Unsupervised single channel speech separation based on optimized subspace separation,” Speech Communication, vol. 96, pp. 93-101, 2018.
  • P. Sun, A. Mahdi, J. Xu, and J. Qin, “Speech enhancement in spectral envelop and details subspaces,” Speech Communication, vol. 101, pp. 57- 69, 2018.
  • F. Bao, W. H. Abdulla, F. Bao, and W. H. Abdulla, “A New Ratio Mask Representation for CASA-Based Speech Enhancement,” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 27, no. 1, pp. 7-19, 2019.
  • X. Wang, F. Bao, and C. Bao, “IRM estimation based on data field of cochleagram for speech enhancement,” Speech Communication, vol. 97, pp. 19-31, 2018.
  • X. Wang, C. Bao, and F. Bao, “A model-based soft decision approach for speech enhancement,” China Communications, vol. 14, no. 9, pp. 11-22, 2017.
  • S. Liang, W. Liu, and W. Jiang, “A new Bayesian method incorporating with local correlation for IBM estimation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 3, pp. 476-487, 2012.
  • A. Narayanan and D. Wang, “A CASA-based system for long-term SNR estimation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 9, pp. 2518-2527, 2012.
  • G. Hu, and D. Wang, “A tandem algorithm for pitch estimation and voiced speech segregation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 8, pp. 2067-2079, 2010.
  • Y. K. Lee and O. W. Kwon, “Application of shape analysis techniques for improved CASA-based speech separation,” IEEE Transactions on Consumer Electronics, vol. 55, no. 1, pp. 146-149, 2009.
  • T. May and T. Dau, “Computational speech segregation based on an auditory-inspired modulation analysis,” The Journal of the Acoustical Society of America, vol. 136, no. 6, 3350-3359, 2014.
  • N. Rehman, C. Park, N. E. Huang, and D. P. Mandic, “EMD via MEMD: multivariate noise-aided computation of standard EMD,” Advances in Adaptive Data Analysis, vol. 5, no. 02, pp. 1350007, 2013.
  • A. Upadhyay and R. B. Pachori, “Speech enhancement based on mEMDVMD method,” Electronics Letters, vol. 53, no. 7, pp. 502-504, 2017.
  • K. Khaldi, A. O. Boudraa, and M. Turki, “Voiced/unvoiced speech classification-based adaptive filtering of decomposed empirical modes for speech enhancement,” IET Signal Processing, vol. 10, no. 1, pp. 69-80, 2016.
  • L. Zao, R. Coelho and P. Flandrin, “Speech enhancement with emd and hurst-based mode selection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 5, pp. 899-911, 2014.
  • M. E. Hamid, M. K. I. Molla, X. Dang, and T. Nakai, “Single channel - 89 - Regular Issue speech enhancement using adaptive soft-thresholding with bivariate EMD,” ISRN signal processing, 2013.
  • N. Chatlani and J. J. Soraghan, “EMD-based filtering (EMDF) of lowfrequency noise for speech enhancement,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 4, pp. 1158-1166, 2011.
  • K. Khaldi, A. O. Boudraa, A. Bouchikhi, and M. T. H. Alouane, “Speech enhancement via EMD,” EURASIP Journal on Advances in Signal Processing, vol. 2008, no. 1, pp. 873204, 2008.
  • M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in ICASSP’79, IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, IEEE, 1979, pp. 208-211.
  • H. Gustafsson, S. E. Nordholm, and I. Claesson, “Spectral subtraction using reduced delay convolution and adaptive averaging,” IEEE Transactions on Speech and Audio Processing, vol. 9, no. 8, 799-807, 2001.
  • S. Kamath and P. Loizou, “A multi-band spectral subtraction method for enhancing speech corrupted by colored noise,” in ICASSP, vol. 4, 2002, pp. 44164-44164.
  • Y. Hu and P. C. Loizou, “Speech enhancement based on wavelet thresholding the multitaper spectrum,” IEEE Transactions on Speech and Audio Processing, vol. 12, no. 1, pp. 59-67, 2004.
  • Y. Hu and P. C. Loizou, “A generalized subspace approach for enhancing speech corrupted by colored noise,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 4, pp. 334-341, 2003.
  • F. Jabloun and B. Champagne, “Incorporating the human hearing properties in the signal subspace approach for speech enhancement,”•IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 700-708, 2003.
  • E. H. Rothauser, “IEEE recommended practice for speech quality measurements,” IEEE Trans. on Audio and Electroacoustics, vol. 17, pp. 225-246, 1969.
  • H. G. Hirsch and D. Pearce, “The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions,” In ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW), 2000.
  • A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing 2001 (ICASSP’01), vol. 2, 2001, pp. 749-752.
  • C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “An algorithm for intelligibility prediction of time–frequency weighted noisy speech,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125-2136, 2011.