Print the lyrics to "Twinkle Twinkle Little Star"

24

2

Your goal is to print the lyrics to the song "Twinkle Twinkle Little Star" as each note is played.

The computer's microphone will hear notes. If the pitch (but not necessarily the length) of the note is correct, print the appropriate syllable. Otherwise, do nothing. Each note will be at least half a second long, and there will be a break of at least a quarter of a second between notes.

Use the musical notes provided here, and the following lyrics: (Vertical lines represent syllable breaks.)

Twin|kle, twin|kle, lit|tle star,

How I won|der what you are.

Up a|bove the world so high,

Like a dia|mond in the sky.

Twin|kle, twin|kle, lit|tle star,

How I won|der what you are.

A recording of the music can be found here.

Example

The computer hears a middle C and prints "Twin"

It hears another middle C and prints "kle,"

Then it hears another middle C (wrong note) and does nothing.

Then it hears the G above middle C and prints "twin" and so on.

Rules

  • Punctuation must be as shown.
  • Spacing must be as shown (with spaces and newlines).
  • The whitespace may be printed along with either the previous or the next syllable.

Ypnypn

Posted 2014-07-14T16:17:51.077

Reputation: 10 485

2Is there any way to relax "must be printed before note ends?" With 1/16 second notes, even if you dedicate 3/4 of that time to sampling, you only have ~47ms of sound to work with. That gives pretty murky frequency resolution for mid-range notes. – Geobits – 2014-07-14T16:34:24.397

@Geobits Good point; I removed that rule. – Ypnypn – 2014-07-14T16:36:27.653

1This is the first puzzle using audio input that I could find! Congrats! – Not that Charles – 2014-07-14T18:18:36.220

1Is the title mispelled on purpose to differentiate the two twinkles? – Rainbolt – 2014-07-14T19:48:00.690

@Rusher No, it was a bad mistake on my part. Sorry. – Ypnypn – 2014-07-14T21:09:10.503

Generally I hear Twinkle, Twinkle, Little Star played at around 120 BPM at its fastest. Perhaps you could relax the requirement to at least 0.4 seconds of the note being played and 0.1 seconds of the note being released, since most renditions will only go as fast as that. – Joe Z. – 2014-07-14T22:31:13.040

@JoeZ. I relaxed the requirements. – Ypnypn – 2014-07-15T00:51:46.820

Does punctuation include the vertical bars? – Not that Charles – 2014-07-15T01:34:58.347

@Charles No; do not output the vertical bars. – Ypnypn – 2014-07-15T16:08:50.707

1Could we have a link to an audio file for testing? – Calvin's Hobbies – 2014-07-19T04:38:15.630

@Calvin'sHobbies https://en.wikipedia.org/wiki/File:Twinkle_Twinkle_Little_Star_plain.ogg (also linked to in the question)

– Ypnypn – 2014-07-20T15:30:19.803

Answers

7

Python 3 - Partial solution (760 742 734 710 705 657 characters)

(Last edit; I promise)

This seems like a really, pretty, very hard problem (especially recognizing where the notes start or end). Automatic transcription of music seems like an open research topic (not that I know anything about it). So here's a partial solution that doesn't do any note segmentation (eg it prints "Twinkle" all at once when it hears the frequency) and probably only works for that specific ogg file:

A=-52
F=44100
C=4096
import pyaudio as P
import array
import scipy.signal as G
import numpy as N
import math
L=math.log
i=0
j=[9,2,0,2,4,5,7,9]
k=[2,4,5,7]
n=j+k+k+j
w="Twinkle, |twinkle, |little |star,\n|How I |wonder |what you |are.\n|Up a|bove the |world so |high,\n|Like a |diamond |in the |sky.\n".split('|')
w+=w[:8]
e=P.PyAudio().open(F,1,8,1,0,None,0,C)
while i<24:
 g=array.array('h',e.read(C));b=sum(map(abs,g))/C
 if b>0 and 20*L(b/32768,10)>A:
  f=G.fftconvolve(g,g[::-1])[C:];d=N.diff(f);s=0
  while d[s]<=0:s+=1
  x=N.argmax(f[s:])+s;u=f[x-1];v=f[x+1]
  if int(12*L(((u-v)/2/(u-2*f[x]+v)+x)*F/C/440,2))==n[i]+15:print(w[i],end='',flush=1);i+=1

This requires...

Change the A=-52 (minimum amplitude) on the top line depending on your microphone, ammount of ambient sound, how loud the song is playing, etc. On my microphone, less than -57 seems to pick up a lot of extraneous noise and more than -49 requires you play it very loud.

This could be golfed a LOT more; I'm sure there are ways to save a bunch of characters on the words array in particular. This is my first non-trivial program in python, so I'm not very familiar with the language yet.

I stole the code for frequency detection via autocorrelation from https://gist.github.com/endolith/255291

Ungolfed:

import pyaudio
from array import array
import scipy.signal
import numpy
import math
import sys

MIN_AMPLITUDE = -52
FRAMERATE = 44100

def first(list):
    for i in range(len(list)):
        if(list[i] > 0):
            return i
    return 0

# Based on: https://en.wikipedia.org/wiki/Decibel#Acoustics
def getAmplitude(sig):
    total = 0;
    elems = float(len(sig))
    for x in sig:
        total += numpy.abs(x) / elems
    if(total == 0):
        return -99
    else:
        return 20 * math.log(total / 32768., 10)    

# Based on: https://en.wikipedia.org/wiki/Piano_key_frequencies
def getNote(freq):
    return int(12 * math.log(freq / 440, 2) + 49)

# --------------------------------------------------------------------------
# This is stolen straight from here w/ very slight modifications: https://gist.github.com/endolith/255291
def parabolic(f, x):
    return 1/2. * (f[x-1] - f[x+1]) / (f[x-1] - 2 * f[x] + f[x+1]) + x

def getFrequency(sig):
    # Calculate autocorrelation (same thing as convolution, but with
    # one input reversed in time), and throw away the negative lags
    corr = scipy.signal.fftconvolve(sig, sig[::-1], mode='full')
    corr = corr[len(corr)/2:]

    # Find the first low point
    diffs = numpy.diff(corr)

    # Find the next peak after the low point (other than 0 lag). This bit is
    # not reliable for long signals, due to the desired peak occurring between
    # samples, and other peaks appearing higher.
    # Should use a weighting function to de-emphasize the peaks at longer lags.
    start = first(diffs)
    peak = numpy.argmax(corr[start:]) + start
    return parabolic(corr, peak) * (FRAMERATE / len(sig))
# --------------------------------------------------------------------------

# These are the wrong keys (ie it is detecting middle C as an A), but I'm far too lazy to figure out why.
# Anyway, these are what are detected from the Wikipedia .ogg file:
notes = [73,          66,           64,       66,         68,       69,        71,          73,       66,     68,          69,         71,         66,        68,         69,        71      ] 
words = ["Twinkle, ", "twinkle, ", "little ", "star,\n",  "How I ", "wonder ", "what you ", "are.\n", "Up a", "bove the ", "world so ", "high,\n", "Like a ", "diamond ", "in the ", "sky.\n"]
notes += notes[:8]
words += words[:8]

pa = pyaudio.PyAudio()
stream = pa.open(format=pyaudio.paInt16, channels = 1, rate = FRAMERATE, input = True, frames_per_buffer = 4096)
idx = 0
while(idx < len(notes)):
    # Read signal
    sig = array('h', stream.read(4096))
    if(getAmplitude(sig) > MIN_AMPLITUDE):
        note = getNote(getFrequency(sig))
        if(note == notes[idx]):
            sys.stdout.write(words[idx])
            sys.stdout.flush()
            idx += 1

Robert Fraser

Posted 2014-07-14T16:17:51.077

Reputation: 912

I wrote a little syntax help for you. Check lines 14-29 and 80-88. http://pastebin.com/W9XSYwMJ

– seequ – 2014-07-28T04:23:19.020

@Sieg -- Awesome; thanks! Old habits are hard to break; – Robert Fraser – 2014-07-28T07:29:48.553