Find the beats in an MP3 file

27

11

In this challenge, your task is to take an simple recording in mp3 format and find the time offsets of the beats in the file. Two example recordings are here:

https://dl.dropboxusercontent.com/u/24197429/beats.mp3 https://dl.dropboxusercontent.com/u/24197429/beats2.mp3

Here is the third recording with much more noise than the two previous:

https://dl.dropboxusercontent.com/u/24197429/noisy-beats.mp3

For example, the first recording is 65 seconds long and contains exactly (unless I miscounted!) 76 beats. Your job is to devise a program that takes such an mp3 file as input and outputs a sequence of the time offsets in milliseconds of the beats in the file. A beat is defined to occur, of course, when the guitarist playing plucks one or more strings.

Your solution must:

  • Work on any mp3 file of similar "complexity". It can fail on to noisy recordings or to quickly played melodies -- I don't care.
  • Be fairly precise. Tolerance is +/- 50 ms. So if the beat occurs at 1500 ms and your solution reports 1400, then that is unacceptable.
  • Use only free software. Calling ffmpeg is allowed as is using any freely available third party software for your language of choice.

The winning criteria is ability to successfully detect beats despite noise in the supplied files. In case of a tie, the shortest solution wins (length of 3rd party code isn't added to the count).

Björn Lindqvist

Posted 2014-01-15T10:28:06.153

Reputation: 590

I'm voting to close this as unclear because "similar complexity" is not even close to being objective. – Mego – 2016-05-24T06:40:01.693

Why is that unclear? – Björn Lindqvist – 2016-05-24T11:55:03.710

1Even though this looks interesting, this is a contest, you should define winning criteria more precisely than "Correctness". – Fabinout – 2014-01-15T10:32:59.897

ok better now?? – Björn Lindqvist – 2014-01-15T10:38:30.070

18A good contest isolates the part of interest. Here you seem to be interested in beat identification, which is certainly an interesting DSP problem. So why make the programs handle (or outsource) the complexities of the MP3 file format? The question would be improved by taking either RAW (with permitted assumptions about sample rate, bit depth, and endianness) or WAV (similarly). – Peter Taylor – 2014-01-15T10:38:43.620

3The point of the contest is to handle all those pieces. Perhaps that makes it hard to solve it in golfscript if it has a hard time interfacing with mp3's. Never the less, the challenge is well-specified and (afaict) completely on-topic so the negativity is very dismaying. – Björn Lindqvist – 2014-01-15T11:43:12.960

8@BjörnLindqvist You shouldn't take suggestions for improvement to heart. Unless some previous comment has been deleted, I don't see any negative comments here, just suggestions for improvements. – Gareth – 2014-01-16T08:38:09.577

@Gareth I believe OP is worried about the downvotes (post has +3/-2). These occured before the latest edit (source: http://codegolf.stackexchange.com/posts/18543/timeline/).

– John Dvorak – 2014-01-16T10:49:18.630

Answers

6

Python 2.7 492 bytes (beats.mp3 only)

This answer can identify the beats in beats.mp3, but will not identify all notes on beats2.mp3 or noisy-beats.mp3. After the description of my code, I'll go into detail as to why.

This uses PyDub (https://github.com/jiaaro/pydub) to read in the MP3. All other processing is NumPy.

Golfed Code

Takes a single command line argument with the file name. It will output each beat in ms on a separate line.

import sys
from math import *
from numpy import *
from pydub import AudioSegment
p=square(AudioSegment.from_mp3(sys.argv[1]).set_channels(1).get_array_of_samples())
n=len(p)
t=arange(n)/44.1
h=array([.54-.46*cos(i/477) for i in range(3001)])
p=convolve(p,h, 'same')
d=[p[i]-p[max(0,i-500)] for i in xrange(n)]
e=sort(d)
e=d>e[int(.94*n)]
i=0
while i<n:
 if e[i]:
  u=o=0
  j=i
  while u<2e3:
   u=0 if e[j] else u+1
   #u=(0,u+1)[e[j]]
   o+=e[j]
   j+=1
  if o>500:
   print "%g"%t[argmax(d[i:j])+i]
  i=j
 i+=1

Ungolfed Code

# Import stuff
import sys
from math import *
from numpy import *
from pydub import AudioSegment

# Read in the audio file, convert from stereo to mono
song = AudioSegment.from_mp3(sys.argv[1]).set_channels(1).get_array_of_samples()

# Convert to power by squaring it
signal = square(song)
numSamples = len(signal)

# Create an array with the times stored in ms, instead of samples
times = arange(numSamples)/44.1

# Create a Hamming Window and filter the data with it. This gets rid of a lot of
# high frequency stuff.
h = array([.54-.46*cos(i/477) for i in range(3001)])
signal = convolve(signal,h, 'same') #The same flag gets rid of the time shift from this

# Differentiate the filtered signal to find where the power jumps up.
# To reduce noise from the operation, instead of using the previous sample,
# use the sample 500 samples ago.
diff = [signal[i] - signal[max(0,i-500)] for i in xrange(numSamples)]

# Identify the top 6% of the derivative values as possible beats
ecdf = sort(diff)
exceedsThresh = diff > ecdf[int(.94*numSamples)]

# Actually identify possible peaks
i = 0
while i < numSamples:
 if exceedsThresh[i]:
  underThresh = overThresh = 0
  j=i
  # Keep saving values until 2000 consecutive ones are under the threshold (~50ms)
  while underThresh < 2000:
   underThresh =0 if exceedsThresh[j] else underThresh+1
   overThresh += exceedsThresh[j]
   j += 1
  # If at least 500 of those samples were over the threshold, take the maximum one
  # to be the beat definition
  if overThresh > 500:
   print "%g"%times[argmax(diff[i:j])+i]
  i=j
 i+=1

Why I miss notes on the other files (and why they are incredibly challenging)

My code looks at changes in signal power in order to find the notes. For beats.mp3, this works really well. This spectrogram shows how the power is distributed over time (x axis) and frequency (y axis). My code basically collapses the y axis down to a single line. beats.jpeg Visually, it's really easy to see where the beats are. There's a yellow line that tapers off again and again. I highly encourage you to listen to beats.mp3 while you follow along on the spectrogram to see how it works.

Next I'll go to noisy-beats.mp3 (because that's actually easier than beats2.mp3. noisy-beats.png. Once again, see if you can follow along with recording. Most of the lines are fainter, but still there. However, in some spots, the bottom string is still ringing when the quiet notes start. That makes finding them especially hard, because now, you have to find them by changes in frequency (the y axis) rather than just amplitude.

beats2.mp3 is incredibly challenging. Here's the spectrogram beats2.jpeg In the first bit, there some lines, but some notes really bleed over the lines. To reliably identify notes, you'd have to start tracking the pitch of the notes (fundamental and harmonics) and seeing where those change. Once the first bit is working, the second bit is twice as hard as the tempo doubles!

Basically, to reliably identify all of these, I think it takes some fancy note detection code. Seems like this would be a good final project for someone in a DSP class.

Dominic A.

Posted 2014-01-15T10:28:06.153

Reputation: 533

I think this is not allowed, because it doesn't fulfill all the requirments. Good answer, but it needs some work. – Rɪᴋᴇʀ – 2016-02-28T23:38:34.910

Yeah, I'm a bit disappointed this method didn't work out as I hoped. I figure this might help someone else who wants to take a stab at it. If I get some spare time this week, I hope to try a new FFT based approach that should give better results. – Dominic A. – 2016-03-01T02:37:27.533

Okay, cool. Nice job though. – Rɪᴋᴇʀ – 2016-03-01T14:33:11.377