A nonparametric variable step-size subband adaptive filtering algorithm for acoustic echo cancellation

Acoustic echo cancellation is often applied in communication and video call system to reduce unnecessary echoes generated between speakers and microphones. In these systems, the speech input signal of the adaptive filter is often colored and unstable, which decays the convergence rate of the adaptive filter if the NLMS algorithm is used. In this paper, an improved nonparametric variable step-size subband (NPVSS-NSAF) algorithm is proposed to address the problem. The variable step-size is derived by minimizing the sum of the square Euclidean norm of the difference between the optimal weight vectors to be updated and the past estimated weight vectors. Then the parameters are eliminated by using the power of subband signal noise equal to the power of subband posteriori error. The performance of the proposed algorithm is simulated in the aspects of misalignment and return loss enhancement. Experiment results show a fast convergence rate and low misalignment of the proposed algorithm in system identification.


Introduction 
In modern communication systems, such as cellphones, video communication systems, and car phone systems, the phenomena of acoustic echo are inevitable, which may greatly affect speech quality and even cause discomfort for users. Therefore, the problem in need of immediate solutions in communication systems is improving voice quality. The acoustic echo cancellation (AEC) technology is a typical approach of improving the quality on the remote end, which has been widely concerned in recent years.
Various adaptive filtering algorithms are applied to the AEC system [1][2][3][4][5][6] , the basic principle of which is through building an echo path of impulse response signal to generate an electronic replica of the real acoustic echo between the loudspeaker and the microphone. Then the echo is canceled by subtracting echo signal from the microphone signal [7] . Widrow et al. initially proposed the adaptive least mean square (LMS) algorithm [8] , which is widely used in echo cancellation because of its simple structure and strong robustness [9] . However, the fixed step size of the LMS algorithm cannot meet the requirement of fast convergence rate and low misalignment simultaneously. Therefore, different variable step size normalized least mean square (VSS-NLMS) algorithms have been extensively proposed and used [10][11][12][13][14][15][16] .
Although these algorithms have fast convergence rates and low misalignment, when the input signal is colored, the convergence rate may decay significantly [17] . Therefore, in order to improve the convergence rate of the colored input signal, Lee et al. proposed a normalized subband adaptive filtering (NSAF) algorithm [18] , which converts the input signal into subband signal and whitens the subband signal. Although the NSAF algorithm can improve the convergence rate of the colored input signal, both it and the NLMS algorithm have the defect of fixed step size which cannot result in a fast convergence rate and low misalignment simultaneously. Therefore, Shams et al. proposed a new VSS-NSAF algorithm which has a faster convergence rate and lower misalignment compared to the NSAF algorithm [19] . Meanwhile, Jae [20] and Ni [21] proposed a variable step size NSAF adaptive filtering algorithm which is derived by minimizing the mean-square deviation between the optimal weight vector and the estimated weight vector at each iteration at the same time. Although these algorithms improve the convergence rate and low maladjustment, the complexity of calculation is increased seriously. In order to guarantee fast convergence rate, low misalignment and low computational complexity simultaneously, a variety of variable step-size NSAF algorithms are proposed in succession [22][23][24][25][26][27][28] .
However, these algorithms need to introduce some parameters which are difficult to adjust in a practical application. In this paper, a nonparametric variable step size normalized subband adaptive filtering (NPVSS-NSAF) algorithm is proposed. The variable step size iterative formula is obtained by minimizing the sum of the square Euclidean norm of the difference between the optimal weight vectors to be updated and the past estimated weight vectors. Then the step size factor without parameters is derived by making the power of subband signal noise equal to the power of subband posteriori error. The algorithm proposed in this paper has strong robustness since the NPVSSS algorithm is easy to control in practical applications [29] . The superiority of the proposed algorithm is demonstrated by computer simulation, the results illustrate that the NPVSS-NASF algorithm behaves more excellence than other related algorithms in the non-stationary environment.
The structure of this paper is organized as follows. In Section 2, we review the system model and some related algorithms. Section 3 proposes an improved nonparametric variable step size subband adaptive filtering (NPVSS-NSAF) algorithm. Section 4 shows the simulation results that the improved algorithm has better performance in the non-stationary environment than other algorithms. Finally, conclusions are given in Section 5.

System model
In a communication and video call system, the echo of the speech signal which is picked up by the microphone is caused by the spatial reflection. The AEC system models the echo path between the loudspeaker and the microphone to eliminate the echo from the signal the structure is showed in Figure 1 [30] . (1) where, X(n) = [x(n)x(n-1)-x(n-L+1)] T is the system input signal from the far end, and the impulse response of the estimated system is h=[h 0 (n), h 1 (n), …, h (N-1) (n)] T , whose length is L, the sign T represents the transpose of one vector or matrix; v(n) is the near-end signal, which is constituted of the speech signal s(n) and the background noise signal b(n). And y(n) is the output of the estimated system, which is the real echo signal.
The output of the adaptive estimator is, where, W(n) = [w 0 (n)w 1 (n), …, w (L-1) (n)] T is the weight coefficient of the adaptive filter, which is updated automatically with the change of environment [31] . The error signal e(n) is subtracting the filter output signal from the microphone signal ( ) ( ) ( ) e n d n y n  Substitute (2) into (3), a priori estimation error e(n): And a posteriori estimation error ε(n)

VSS-NLMS
The NLMS algorithm is described as follows: where, μ(0<μ<1) which controls the convergence rate of the algorithm is a positive scalar known as the step size; c is a very small constant that is used to avoid division by zero. However, μ is a fixed step size that cannot meet the requirement of fast convergence rate and low misalignment simultaneously. In order to solve the contradiction between fast convergence rate and low misalignment, a variable step-size algorithm can be used. The idea of the algorithm is that it has a larger convergence rate in the initial stage, that is, a larger step size; meanwhile, at the end of the convergence phase, a smaller step size is used to ensure a lower misalignment. Therefore, the variable step-size NLMS (VSS-NLMS) algorithm balances the tradeoff between convergence rate and final misalignment. The iterative formula is: where, μ(n) is the variable step size [32] .

Subband adaptive filters
The idea of subband filtering mainly comes from subband coding, which has been mentioned subband adaptive filtering in references [33,34] . In the subband adaptive filter, the input signal is decomposed by multiple parallel channels and more efficient signal processing can be achieved by using the characteristics of subband segmentation. In addition, the correlation of the input signal is reduced by the filter group and the subband filter is implemented at under sampling rate. Therefore, the subband adaptive filter can achieve fast convergence and reduce computational complexity. Figure 2 shows the traditional SAF structure for an application of adaptive system identification. The fullband input signal x(n) and desired response signal d(n) are decomposed into N spectral bands by using analysis filters H i (z), i=0,1,…N-1. Meanwhile, these subband signals are extracted by using a lower rate and processed by many adaptive subfilters using the same factor D. Each subfilter which calculates its error signal separately is independent, and the correlation subband error signal is minimized by updating iteration. Finally, the synthetic filter bank is used to interpolate and recombine all subband error signals to obtain the fullband error signal e(n). Notice that the variable n is the time index of the full band signal and k is the time index of the extracted subband signal.  It can be seen from Figure 2: where, w 0 denotes the tap weight vector of an unknown system to be estimated. Signals d(n), X(n) and v(n) are decomposed into d i (n), X i (n) and v i (n) through H i (z), i=0,1,…N-1. Then the subband signals y i (n) and d i (n) are decimated at a lower sampling rate to yield signals y i,D (k) and d i,D (k). The ith subband output signal is expressed as: The update equation for the traditional NSAF is where, μ is fixed step size and the ith subband error is expressed:

Proposed NPVSS-NSAF algorithm
In this section, an improved nonparametric variable step size NSAF algorithm (NPVSS-NSAF) is proposed to overcome the contradiction between convergence rate and low misalignment when the input signal is colored. This algorithm minimizes the sum of the square Euclidean norm of the difference between the optimal weight vectors to be updated and the past estimated weight vectors to obtain the variable step size avoiding that the echo path estimation vector does not produce large fluctuation under the condition of low SNR. To further improve the robustness of the VSS-NSAF algorithm, the step size factor without parameters is derived by making the power of subband signal noise equal to the power of subband posteriori error. The following optimization problem can be established: where, φ as the weight factor, 0<φ<1.
We have: A new formula by using Lagrange multipliers is derived: where, λ i is the Lagrange multiplier. Then take the derivative of (14) with respect W o We make the result of (15) equal to zero: Put (13) into matrix form: where, where, ,...
Make the subband a posterior error signals equal to the subband systems noise [12] . ( , ) ( , ) ( , ) It can be concluded from Literature [16], is the power of the subband system noise. We define 22 as the power of the subband input signal. Then substitute (10) into (21), using (11) to cancel w(k).
, , where, B is an exponential window. This estimation could result in a lower magnitude than , 2 iD v  . The proposed algorithm can be summarized as follows:

Results and discussion
In this part, in order to analyze the performance of the NPVSS-NSAF algorithm, simulation is carried out in the context of AEC. The simulation is divided into two parts: one is to use the white Gaussian noise (WGN) as the input signal to evaluate the performance of the proposed algorithm; the other is to evaluate the performance with speech input signals. In the simulation, the NPVSS-NSAF algorithm is compared with the NLMS algorithm, SM-NLMS algorithm and VSS-NLMS algorithm.

Criteria evaluation
In this paper, to evaluate the property of the proposed NPVSS-NSAF algorithm, two objective criteria which are normalized misalignment and echo return loss enhancement (ERLE) are utilized. These two evaluation criteria which are widely used can well evaluate the superiority of an adaptive filtering algorithm.
Normalized misalignment is given by the following formula in dB: The closer the adaptive filter coefficient is to the real echo path, the smaller the normalized misalignment is and the better the experimental results are.
ERLE is used to detect the amount of loss caused by the adaptive filter, which is expressed by the following formula in dB: where, e(n) is the normal linear filtering error. The bigger the ERLE is and the better the experimental results are.

WGN signal used in simulation
In this section, in order to analyze the convergence rate and the misalignment of the proposed algorithm, an acoustic impulse response of length L=512 is used and the same length is used for all the adaptive filters at 8 kHz sampling rate. The input signal is WGN which is independent is added to the output of the echo path, at different SNR: 30 dB and 10 dB. In this experiment, the noise power of the system is known.
In Figure 3, the misalignment is used as a learning curve and the WGN signal with SNR=30 dB is used as the input signal to compare the stability of the NPVSS-NSAF algorithm with the other three algorithms (NLMS algorithm with step size μ = 1.0, VSS-NLMS and SM-NLMS). Figure 3 shows that the convergence rate of NLMS (μ = 1.0), SM-NLMS in the initial stage is the same as that of NPVSS-NSAF. But in the stationary state, the misalignment of this paper proposed algorithm is greatly lower than both of the NLMS (μ=1.0) and SM-NLMS algorithm. We can also note that although the final misalignment of VSS-NLMS is better than that of SM-NLMS and NLMS (μ = 1.0), the convergence rate of VSS-NLMS is not as fast as that of them. Therefore, the proposed algorithm in this paper has a lower misalignment and faster convergence rate in comparison with the other three algorithms. In this simulation, in order to prove the superiority of the proposed algorithm and eliminate the influence of SNR, the input signal is a white Gaussian noise with SNR=10 dB. Figure 4 shows that the performance of the proposed algorithm behaves better than the other three algorithms in terms of fast convergence rate and low misalignment in the non-stationary environment.
In this part, the ERLE learning curve is discussed in the same configuration of the simulation, and the results of the ERLE criterion are shown in Figure 5. It shows that the proposed algorithm in this paper has a significant advantage in terms of the ERLE criterion; the ERLE of the NPVSS-NSAF algorithm is over 3 dB higher than SM-NLMS, over 5 dB than NLMS (μ=1.0) and over 8 dB than VSS-NLMS. In addition, the NPVSS-NASF also has a faster convergence rate than the other three algorithms. Therefore, the results suggest that the theory that the NPVSS-NSAF is superior to the other three algorithms when the input signal is WGN. When the WGN is used as the input signal, the proposed algorithm in this paper is better than the other three algorithms in terms of two objective criteria as misalignment and ERLE in this section. However, the WGN signal is not fully capable of the NPVSS-NASF algorithm to prove excellence in echo cancellation. Therefore, speech signals are used to testify the performance of the proposed algorithm in the next section.

Speech signals used in simulation
In this subsection, in order to verify the excellent performance of the proposed algorithm in a non-stationary environment, speech signals are acquired from the TIMIT database [35] . In this experiment, two speech signals are used as input signals, which are far-end speech signal and speech near-end signal respectively. The far-end speech signal x(n) is pronounced by a female speaker, and the pronunciation sentence is as follows: ''She always ask an objective question.'' The proximal speech signal s(n) is pronounced by a male speaker, and the pronunciation sentence is as follows: ''don't let him eat too many strawberries''. And the colored noise signal could be added to the original signals with different SNR (10 dB and 30 dB) testifying the excellent performance of the improved algorithm under the condition of the color input signal. In order to make the experiment convincing, an acoustic impulse response of length L=512 is used and all the adaptive filers have the same length as the acoustic impulse response at 8 kHz sampling rate. At last, three traditional adaptive algorithms, NLMS, VSS-NLMS and SM-NLMS (the total number of iterations n=8000), are employed, are compared with the proposed algorithm in terms of misalignment and ERLE.
In Figure 6, the misalignment is used as a learning curve and the speech signal with SNR=30 dB is used as the input signal to compare the stability of the NPVSS-NSAF algorithm with the other three algorithms (NLMS algorithm with step size μ=1.0, VSS-NLMS and SM-NLMS) in the non-stationary environment.
The simulation results show that the convergence rates of NLMS (μ=1.0) and SM-NLMS in the initial stage are the same as that of NPVSS-NSAF. However, in the stationary state, the misalignment of the paper proposed algorithm is greatly lower than both of the NLMS (μ=1.0) and SM-NLMS algorithm. We can also note that although the final misalignment of VSS-NLMS is better than that of SM-NLMS and NLMS (μ=1.0), the convergence rate of VSS-NLMS is not as fast as that of them. And it is easy to find that the convergence rate of the VSS-NLMS algorithm decays seriously when the input signal is colored. Therefore, it can be seen that the misalignment and convergence rate of the proposed algorithm in this paper is better than VSS-NLMS, the VSS-NLMS and SM-NLMS when the input signal is colored. Figure 6 Misalignment evaluation of double-talk speech with SNR=30 dB In this simulation, in order to prove the superiority of the proposed algorithm and eliminate the influence of SNR, the speech signal from the noise TIMIT database with SNR=10 dB is used as the input signal. Figure 7 shows that the value of the misalignment of the four algorithms increases a little when the SNR decreases. The simulation results show that the proposed algorithm has a lower misalignment and faster convergence rate in comparison with the other three algorithms when the input signal is colored. Figure 7 Misalignment evaluation of double-talk speech with SNR=10 dB In this part, the same configuration of the simulation is employed to discuss the ERLE learning curve, and the results of the ERLE criterion are shown in Figure 5. It shows that the adaptive filter introduces a lower loss in the steady-state value of NPVSS-NSAF compared to the other three algorithms (high ERLE). Therefore, the results suggest that the theory that the proposed algorithm is superior to the other three algorithms.

Conclusions
In this paper, a nonparametric variable step size subband adaptive filtering algorithm has been proposed and its performance has been deeply investigated with various input signals such as WGN signal, colored noise signal and speech signal. To verify the effectiveness of the proposed algorithm, which was compared with the VSS-NLMS, SM-NLMS and the standard NLMS algorithm in terms of two objective criteria as misalignment and ERLE. The experiment results showed that the NPVSS-NSAF algorithm behaved better performance in fast convergence and low misalignment than the other three algorithms. Therefore, the proposed algorithm is a good echo cancellation method and could achieve excellent performance under the condition of a color input signal and low SNR.