Python 3 - Partial solution (760 742 734 710 705 657 characters)
(Last edit; I promise)
This seems like a really, pretty, very hard problem (especially recognizing where the notes start or end). Automatic transcription of music seems like an open research topic (not that I know anything about it). So here's a partial solution that doesn't do any note segmentation (eg it prints "Twinkle" all at once when it hears the frequency) and probably only works for that specific ogg file:
A=-52
F=44100
C=4096
import pyaudio as P
import array
import scipy.signal as G
import numpy as N
import math
L=math.log
i=0
j=[9,2,0,2,4,5,7,9]
k=[2,4,5,7]
n=j+k+k+j
w="Twinkle, |twinkle, |little |star,\n|How I |wonder |what you |are.\n|Up a|bove the |world so |high,\n|Like a |diamond |in the |sky.\n".split('|')
w+=w[:8]
e=P.PyAudio().open(F,1,8,1,0,None,0,C)
while i<24:
g=array.array('h',e.read(C));b=sum(map(abs,g))/C
if b>0 and 20*L(b/32768,10)>A:
f=G.fftconvolve(g,g[::-1])[C:];d=N.diff(f);s=0
while d[s]<=0:s+=1
x=N.argmax(f[s:])+s;u=f[x-1];v=f[x+1]
if int(12*L(((u-v)/2/(u-2*f[x]+v)+x)*F/C/440,2))==n[i]+15:print(w[i],end='',flush=1);i+=1
This requires...
Change the A=-52 (minimum amplitude) on the top line depending on your microphone, ammount of ambient sound, how loud the song is playing, etc. On my microphone, less than -57 seems to pick up a lot of extraneous noise and more than -49 requires you play it very loud.
This could be golfed a LOT more; I'm sure there are ways to save a bunch of characters on the words array in particular. This is my first non-trivial program in python, so I'm not very familiar with the language yet.
I stole the code for frequency detection via autocorrelation from https://gist.github.com/endolith/255291
Ungolfed:
import pyaudio
from array import array
import scipy.signal
import numpy
import math
import sys
MIN_AMPLITUDE = -52
FRAMERATE = 44100
def first(list):
for i in range(len(list)):
if(list[i] > 0):
return i
return 0
# Based on: https://en.wikipedia.org/wiki/Decibel#Acoustics
def getAmplitude(sig):
total = 0;
elems = float(len(sig))
for x in sig:
total += numpy.abs(x) / elems
if(total == 0):
return -99
else:
return 20 * math.log(total / 32768., 10)
# Based on: https://en.wikipedia.org/wiki/Piano_key_frequencies
def getNote(freq):
return int(12 * math.log(freq / 440, 2) + 49)
# --------------------------------------------------------------------------
# This is stolen straight from here w/ very slight modifications: https://gist.github.com/endolith/255291
def parabolic(f, x):
return 1/2. * (f[x-1] - f[x+1]) / (f[x-1] - 2 * f[x] + f[x+1]) + x
def getFrequency(sig):
# Calculate autocorrelation (same thing as convolution, but with
# one input reversed in time), and throw away the negative lags
corr = scipy.signal.fftconvolve(sig, sig[::-1], mode='full')
corr = corr[len(corr)/2:]
# Find the first low point
diffs = numpy.diff(corr)
# Find the next peak after the low point (other than 0 lag). This bit is
# not reliable for long signals, due to the desired peak occurring between
# samples, and other peaks appearing higher.
# Should use a weighting function to de-emphasize the peaks at longer lags.
start = first(diffs)
peak = numpy.argmax(corr[start:]) + start
return parabolic(corr, peak) * (FRAMERATE / len(sig))
# --------------------------------------------------------------------------
# These are the wrong keys (ie it is detecting middle C as an A), but I'm far too lazy to figure out why.
# Anyway, these are what are detected from the Wikipedia .ogg file:
notes = [73, 66, 64, 66, 68, 69, 71, 73, 66, 68, 69, 71, 66, 68, 69, 71 ]
words = ["Twinkle, ", "twinkle, ", "little ", "star,\n", "How I ", "wonder ", "what you ", "are.\n", "Up a", "bove the ", "world so ", "high,\n", "Like a ", "diamond ", "in the ", "sky.\n"]
notes += notes[:8]
words += words[:8]
pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paInt16, channels = 1, rate = FRAMERATE, input = True, frames_per_buffer = 4096)
idx = 0
while(idx < len(notes)):
# Read signal
sig = array('h', stream.read(4096))
if(getAmplitude(sig) > MIN_AMPLITUDE):
note = getNote(getFrequency(sig))
if(note == notes[idx]):
sys.stdout.write(words[idx])
sys.stdout.flush()
idx += 1
2Is there any way to relax "must be printed before note ends?" With 1/16 second notes, even if you dedicate 3/4 of that time to sampling, you only have ~47ms of sound to work with. That gives pretty murky frequency resolution for mid-range notes. – Geobits – 2014-07-14T16:34:24.397
@Geobits Good point; I removed that rule. – Ypnypn – 2014-07-14T16:36:27.653
1This is the first puzzle using audio input that I could find! Congrats! – Not that Charles – 2014-07-14T18:18:36.220
1Is the title mispelled on purpose to differentiate the two twinkles? – Rainbolt – 2014-07-14T19:48:00.690
@Rusher No, it was a bad mistake on my part. Sorry. – Ypnypn – 2014-07-14T21:09:10.503
Generally I hear Twinkle, Twinkle, Little Star played at around 120 BPM at its fastest. Perhaps you could relax the requirement to at least 0.4 seconds of the note being played and 0.1 seconds of the note being released, since most renditions will only go as fast as that. – Joe Z. – 2014-07-14T22:31:13.040
@JoeZ. I relaxed the requirements. – Ypnypn – 2014-07-15T00:51:46.820
Does punctuation include the vertical bars? – Not that Charles – 2014-07-15T01:34:58.347
@Charles No; do not output the vertical bars. – Ypnypn – 2014-07-15T16:08:50.707
1Could we have a link to an audio file for testing? – Calvin's Hobbies – 2014-07-19T04:38:15.630
@Calvin'sHobbies https://en.wikipedia.org/wiki/File:Twinkle_Twinkle_Little_Star_plain.ogg (also linked to in the question)
– Ypnypn – 2014-07-20T15:30:19.803