Audio Novelty And Autocorrelation for Tempo Estimation

Chris Tralie

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import IPython.display as ipd
import librosa
import librosa.display
x, sr = librosa.load("beatles.wav")
hop_length=512
win = 4096
S = librosa.stft(x, hop_length=hop_length, n_fft=win)
S = np.abs(S)
Sdb = librosa.amplitude_to_db(S,ref=np.max)
ipd.Audio(x, rate=sr)
Out[1]:

"Come Together," the above clip, is at 83 beats per minute (bpm)

In [2]:
M = Sdb.shape[0] # How many rows I have (frequency indices)
N = Sdb.shape[1] # How many columns I have (time window indices)
novfn = np.zeros(N-1) # Pre-allocate space to hold some kind of difference between columns
times = np.arange(N-1)*hop_length/sr
diff = Sdb[:, 1::] - Sdb[:, 0:-1]
diff[diff < 0] = 0 # Cut out the differences that are less than 0
novfn = np.sum(diff, axis=0)
plt.plot(times, novfn)
Out[2]:
[<matplotlib.lines.Line2D at 0x7f506d2da450>]

Autocorrelation

Given a signal $x[n]$ with $N$ samples, the autocorrelation $r[k]$ is define as

$r[k] = \sum_{n = 0}^{N-1} x[n]x[n-k]$

In other words, we line up a signal with itself at differents shifts and take the dot product. As shown below, when we shift by a multiple of a period, the signal lines up well with itself and the dot product is large. This leads to a periodic pattern in the autcorrelation itself

Here is a naive $O(N^2)$ algorithm to compute the autocorrelation directly from the definition

In [7]:
def autocorr_slow(x):
    N = len(x)
    r = np.zeros(N)
    for k in range(N):
        corr = 0
        for n in range(N):
            if n-k >= 0:
                corr += x[n]*x[n-k]
        r[k] = corr
    return r

But we can use the fact that the autocorrelation of a signal is the convolution of that signal and its reverse to compute it more quickly

In [22]:
def autocorr(x):
    N = len(x)
    xpad = np.zeros(N*2)
    xpad[0:N] = x
    F = np.fft.fft(xpad)
    FConv = np.real(F)**2 + np.imag(F)**2 # Fourier transform of the convolution of x and its reverse
    return np.real(np.fft.ifft(FConv)[0:N])
r = autocorr(novfn)
times = np.arange(len(r))*hop_length/sr
plt.plot(times, r)
plt.plot(times, 0.8*10**10*np.ones(len(r)))
Out[22]:
[<matplotlib.lines.Line2D at 0x7f50673c8150>]

Now let's look at the shifts of time where the autocorrelation exceeds a certain value, and convert those shifts to beats per minute. What we see is we get our 83 bpm in there, but we also get double that because of the "microbeat." We also get half of that (about 41 bpm), because if it repeats itself every beat, it repeats itself every two beats as well. And by the same reasoning, we get a quarter of it (about 20 bpm)

In [26]:
intervals = times[r > 0.8*10**10]
print(intervals)
[0.         0.02321995 0.34829932 0.37151927 0.39473923 0.71981859
 0.74303855 0.7662585  1.09133787 1.11455782 1.46285714 1.4860771
 1.83437642 1.85759637 2.20589569 2.57741497 2.94893424]
In [25]:
bpm = 60/intervals
print(bpm)
[          inf 2583.984375    172.265625    161.49902344  151.99908088
   83.35433468   80.74951172   78.30255682   54.97839096   53.83300781
   41.015625     40.37475586   32.70866297   32.29980469   27.19983553
   23.27913851   20.34633366]
/home/ctralie/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in true_divide
  """Entry point for launching an IPython kernel.
In [ ]: