26 February 2007

time based pitch tracking in matlab


Today I'll try a basic algorithm based on long term prediction (LTP). See DAFX p348 and further. It is optimized for a singing voice. This optimization is done in the post-processing part. I have to change the main algorithm the same way I did with the frequency based algorithm of the previous post.
Some calculations were needed to make sure the matrix dimensions of the for-loop are not exceeded.
indexMax = ((blocks-1)*K) + N + pre
-> ((blocks-1)*K) < Nx
-> Nx = n1 - pre = n1 - lmax
< n1 - pre + N + pre
= n1 + N
So we need to pad the X-vector with N extra zeros:
X = [X,zeros(1,N)]; % zero-padding until new length
The algorithm, with indicated external functions and default values:
% M-file 9.23
% pitch_tracker_ltp.m
%
% main file for a pitch tracker based on long-term prediction in time domain
% (c) 2002 Florian Keiler

fname='x1.wav';
%n0=2000; %start index
%n1=210000;
[X,Fs]=wavread(fname);
n0=1;
n1=length(X);

K=200; % hop size for time resolution of pitch estimation
N=1024; %block length

% checked pitch range in Hz:
fmin=50;
fmax=800;
b0_thres=.2; % threshold for LTP coeff
p_fac_thres=1.05; % threshold for voiced detection
% deviation of pitch from mean value

%[xin,Fs]=wavread(fname,[n0 n0]); %get Fs
%lag range in samples:
lmin=floor(Fs/fmax);
lmax=ceil(Fs/fmin);
pre=lmax; %number of pre-samples
if n0-pre<1
n0=pre+1;
end
Nx=n1-n0+1; %signal length
blocks=floor(Nx/K);
% blocks=floor(length(X)/K);%number of blocks
Nx=(blocks-1)*K+N;
%[X,Fs]=wavread(fname,[n0-pre n0+Nx]);
X=X(:,1)';
X = [X,zeros(1,N)]; % zero-padding until new length


pitches=zeros(1,blocks);
for b=1:blocks
x=X((b-1)*K+(1:N+pre));
[M, Fp]=find_pitch_ltp(x, lmin, lmax, N, Fs, b0_thres);
if ~isempty(M)
pitches(b)=Fs/M(1); % take candidate with lowest pitch
else
pitches(b)=0;
end
end

%%%% post-processing:
L=9; % number of blocks for mean calculation
if mod(L,2)==0 % L is even
L=L+1;
end
D=(L-1)/2; %delay
h=ones(1,L)./L; % impulse response for mean calculation
% mirror beginning and end for "non-causal" filtering:
p=[pitches(D+1:-1:2), pitches, pitches(blocks-1:-1:blocks-D)]; % 2D samples longer
y=conv(p,h); % length: blocks+2D+2D
pm=y((1:blocks)+2*D); % cut result

Fac=zeros(1,blocks);
idx=find(pm~=0); % dont divide by zero
Fac(idx)=pitches(idx)./pm(idx);
ii=find(Fac<1 & Fac~=0);
Fac(ii)=1./Fac(ii); % all non-zero element are now > 1
% voiced/unvoiced detection:
voiced=Fac~=0 & Fac<p_fac_thres;

T=40; % time in ms for segment lengths
M=round(T/1000*Fs/K); % min. number of blocks in a row
[V,p2]=segmentation(voiced, M, pitches);
p2=V.*p2; % set pitches to zero for unvoiced

%plotting things
figure(1)
clf
time=(0:blocks-1)*K+1; %start sample of blocks
time=time/Fs; %time in seconds
t=(0:length(X)-1)/Fs; % time in sec for original
subplot(211)
plot(t, X)
title('original x(n)')
axis([0 max([t,time]) -1.1*max(abs(X)) 1.1*max(abs(X))])

subplot(212)
idx=find(p2~=0);
plot_split(idx,time, p2)
title('pitch in Hz')
xlabel('time/s rightarrow')
axis([0 max([t,time]) .9*min(p2(idx)) 1.1*max(p2(idx))])


% synthesize sinusoids:
y2=synth_sine_fade(p2,K,Fs);
soundsc(y2, Fs)
wavwrite(.99*y2,Fs,[fname, '_pitch_ltp'])


This is the result with default arguments:
    • n0 = 1
    • n1 = length(X)
    • K = 200
    • N = 1024
    • fmin = 50
    • fmax = 800
    • b0_thres = .2
    • p_fac_thres = 1.05

The parameter K has the same effect as in the previous post.
Already having a very low threshold value I assumed the algorithm would be able to track x2.wav (a musical piece with strong melody) too (as seen in the previous post). But this is not true here. Whether this is inherent to the algorithm itself or due to the post-processing part I cannot tell. With a quick look on the code it seems that for both algorithms tested until now, the post-processing part is exactly the same, so this suggests that the algorithm itself is not fit for tracking musical melodies.
The threshold value has another effect apparently, because when I enlarge it, the results for x2.wav are somewhat better. I might check the exact meaning of the threshold value, but first I'll try some other algorithms, necause it seems that this one is inherently not fitted for our purpose.

No comments: