33
2
Task
Implement a program in minimum bytes of source or binary code that does voice recognition of a voice sample (me saying "yes", "yeah" or "no" in voice or in whisper, plainly or quirkily) based on training samples with maximum accuracy.
The program should read train/yes0.wav
, train/no0.wav
, train/yes1.wav
and so on (there are 400 yeses and 400 noes in training dataset), then start reading inputs/0.wav
, inputs/1.wav
until it fails to find the file, analysing it and outputting "yes" or "no" (or other word for intermediate answer).
If you want, you may pre-train the program instead of reading train/
, but the resulting data table counts towards the score (and beware of overfitting to training samples - they don't overlap with examination ones). Better to include the program used to produce the data table as an addendum in this case.
All sample files are little endian 16-bit stereo WAV files, just from laptop mic, without filtering/noise reduction.
Limits
Forbidden features:
- Using network;
- Trying to reach the answers file
inputs/key
; - Subverting the
runner
program that calculates accuracy; - Using existing recognition libraries. Linking to FFT implementation not allowed: only external math functions taking constant amount of information (like
sin
oratan2
) are allowed; If you want FFT, just add it's implementation to your program source code (it can be multilingual if needed).
Resource limits:
- The program should not take more than 30 minutes of CPU time on my i5 laptop. If it takes more, only output produced in the first 30 minutes are counted and eveything else is assumed a half-match;
- Memory limit: 1GB (including any temporary files);
Tools
The tools/runner
program automatically runs your solution and calculates the accuracy.
$ tools/runner solutions/example train train/key
Accuracy: 548 ‰
It can examine the program using training data or using actual exam data. I'm going to try submitted answers on examination dataset and publish results (accuracy percentage) until I make the dataset public.
Scoring
There are 5 classes of solution depending on accuracy:
- All samples guessed correctly: Class 0;
- Accuracy 950-999: Class 1;
- Accuracy 835-950: Class 2;
- Accuracy 720-834: Class 3;
- Accuracy 615-719: Class 4;
Inside each class, the score is number of bytes the solution takes.
Accepted answer: smallest solution in the best nonempty class.
Links
- Github project with tools: https://github.com/vi/codegolf-jein
- Training dataset: http://vi-server.org/pub/codegolf-jein-train.tar.xz
- Examination dataset is kept private so far, there are checksums (HMAC) available in the Github repository.
All samples should be considered CC-0 (Public Domain), scripts and programs should be considered MIT.
Example solution
It provides very poor quality of recognition, just shows how to read files and output answers
#define _BSD_SOURCE
#include <stdio.h>
#include <assert.h>
#include <endian.h>
#define Nvols 30
#define BASS_WINDOW 60
#define MID_WINDOW 4
struct training_info {
double bass_volumes[Nvols];
double mid_volumes[Nvols];
double treble_volumes[Nvols];
int n;
};
struct training_info yes;
struct training_info no;
static int __attribute__((const)) mod(int n, int d) {
int m = n % d;
if (m < 0) m+=d;
return m;
}
// harccoded to 2 channel s16le
int get_file_info(const char* name, struct training_info *inf) {
FILE* in = fopen(name, "rb");
if (!in) return -1;
setvbuf(in, NULL, _IOFBF, 65536);
inf->n = 1;
fseek(in, 0, SEEK_END);
long filesize = ftell(in);
fseek(in, 128, SEEK_SET);
filesize -= 128; // exclude header and some initial samples
int nsamples = filesize / 4;
double bass_a=0, mid_a=0;
const int HISTSIZE = 101;
double xhistory[HISTSIZE];
int histpointer=0;
int histsize = 0;
//FILE* out = fopen("debug.raw", "wb");
int i;
for (i=0; i<Nvols; ++i) {
int j;
double total_vol = 0;
double bass_vol = 0;
double mid_vol = 0;
double treble_vol = 0;
for (j=0; j<nsamples / Nvols; ++j) {
signed short int l, r; // a sample
if(fread(&l, 2, 1, in)!=1) break;
if(fread(&r, 2, 1, in)!=1) break;
double x = 1/65536.0 * ( le16toh(l) + le16toh(r) );
bass_a += x;
mid_a += x;
if (histsize == HISTSIZE-1) bass_a -= xhistory[mod(histpointer-BASS_WINDOW,HISTSIZE)];
if (histsize == HISTSIZE-1) mid_a -= xhistory[mod(histpointer-MID_WINDOW ,HISTSIZE)];
double bass = bass_a / BASS_WINDOW;
double mid = mid_a / MID_WINDOW - bass;
double treble = x - mid_a/MID_WINDOW;
xhistory[histpointer++] = x;
if(histpointer>=HISTSIZE) histpointer=0;
if(histsize < HISTSIZE-1) ++histsize;
total_vol += bass*bass + mid*mid + treble*treble;
bass_vol += bass*bass;
mid_vol += mid*mid;
treble_vol += treble*treble;
/*
signed short int y;
y = 65536 * bass;
y = htole16(y);
fwrite(&y, 2, 1, out);
fwrite(&y, 2, 1, out);
*/
}
inf->bass_volumes[i] = bass_vol / total_vol;
inf->mid_volumes[i] = mid_vol / total_vol;
inf->treble_volumes[i] = treble_vol / total_vol;
//fprintf(stderr, "%lf %lf %lf %s\n", inf->bass_volumes[i], inf->mid_volumes[i], inf->treble_volumes[i], name);
}
fclose(in);
return 0;
}
static void zerotrdata(struct training_info *inf) {
int i;
inf->n = 0;
for (i=0; i<Nvols; ++i) {
inf->bass_volumes[i] = 0;
inf->mid_volumes[i] = 0;
inf->treble_volumes[i] = 0;
}
}
static void train1(const char* prefix, struct training_info *inf)
{
char buf[50];
int i;
for(i=0;; ++i) {
sprintf(buf, "%s%d.wav", prefix, i);
struct training_info ti;
if(get_file_info(buf, &ti)) break;
++inf->n;
int j;
for (j=0; j<Nvols; ++j) {
inf->bass_volumes[j] += ti.bass_volumes[j];
inf->mid_volumes[j] += ti.mid_volumes[j];
inf->treble_volumes[j] += ti.treble_volumes[j];
}
}
int j;
for (j=0; j<Nvols; ++j) {
inf->bass_volumes[j] /= inf->n;
inf->mid_volumes[j] /= inf->n;
inf->treble_volumes[j] /= inf->n;
}
}
static void print_part(struct training_info *inf, FILE* f) {
fprintf(f, "%d\n", inf->n);
int j;
for (j=0; j<Nvols; ++j) {
fprintf(f, "%lf %lf %lf\n", inf->bass_volumes[j], inf->mid_volumes[j], inf->treble_volumes[j]);
}
}
static void train() {
zerotrdata(&yes);
zerotrdata(&no);
fprintf(stderr, "Training...\n");
train1("train/yes", &yes);
train1("train/no", &no);
fprintf(stderr, "Training completed.\n");
//print_part(&yes, stderr);
//print_part(&no, stderr);
int j;
for (j=0; j<Nvols; ++j) {
if (yes.bass_volumes[j] > no.bass_volumes[j]) { yes.bass_volumes[j] = 1; no.bass_volumes[j] = 0; }
if (yes.mid_volumes[j] > no.mid_volumes[j]) { yes.mid_volumes[j] = 1; no.mid_volumes[j] = 0; }
if (yes.treble_volumes[j] > no.treble_volumes[j]) { yes.treble_volumes[j] = 1; no.treble_volumes[j] = 0; }
}
}
double delta(struct training_info *t1, struct training_info *t2) {
int j;
double d = 0;
for (j=0; j<Nvols; ++j) {
double rb = t1->bass_volumes[j] - t2->bass_volumes[j];
double rm = t1->mid_volumes[j] - t2->mid_volumes[j];
double rt = t1->treble_volumes[j] - t2->treble_volumes[j];
d += rb*rb + rm*rm + rt*rt;
}
return d;
}
int main(int argc, char* argv[])
{
(void)argc; (void)argv;
train();
int i;
int yes_count = 0;
int no_count = 0;
for (i=0;; ++i) {
char buf[60];
sprintf(buf, "inputs/%d.wav", i);
struct training_info ti;
if(get_file_info(buf, &ti)) break;
double dyes = delta(&yes, &ti);
double dno = delta(&no, &ti);
//printf("%lf %lf %s ", dyes, dno, buf);
if (dyes > dno) {
printf("no\n");
++no_count;
} else {
printf("yes\n");
++yes_count;
}
}
fprintf(stderr, "yeses: %d noes: %d\n", yes_count, no_count);
}
Are the test files will be taken out of the training files, or separate samples altogether? – John Dvorak – 2014-09-01T11:59:18.227
5no fft libraries? Why? – John Dvorak – 2014-09-01T12:02:46.070
Can we use WAV libraries (such as Python's
wave
)? – mnbvmar – 2014-09-01T12:11:55.343@mnbvmar, Yes. – Vi. – 2014-09-01T12:13:28.427
1What about built-in FFT functions? What exactly counts as external? Also, what counts as a math library function? Are we allowed to use
sum
, or we need to usefoldl (+) 0
(foldl not being math-specific, and+
not being variadic)? – John Dvorak – 2014-09-01T12:14:03.890@JanDvorak, Separate samples, but from the same larger dataset. Samples were shuffled, first 400 of each kind were marked as training, the rest are examination. – Vi. – 2014-09-01T12:14:35.370
Why is it a half-match if your program ran out of time? Why not a complete miss? – Claudiu – 2014-09-01T12:15:03.117
@Claudiu just assume the program starts guessing rapid-fire in desperation – John Dvorak – 2014-09-01T12:15:34.257
@JanDvorak, You can insert some FFT implementation into your code and optionally golf it down. If allow FFT, then somebody may ask to allow some other transformation, and it would be tricky to put a line between "Just FFT" and "Ready-made recognition library". So the rule "external math libraries should only provide constant input size functions". – Vi. – 2014-09-01T12:15:59.810
@JanDvorak,
fold
is not a math funciton.(+)
takes constant-sized input, so is allowed. – Vi. – 2014-09-01T12:17:10.3571still... you're effectively forbidding
sum
. I guess that's not your intention? – John Dvorak – 2014-09-01T12:17:22.7601What's the exact definition of math functions? Those that are specialised to operate on numbers? What about a generic "sum" function that uses addition for numbers, but concatenation for strings? Is this sum now allowed? – John Dvorak – 2014-09-01T12:18:58.997
You can easily implement
sum
using other functions. "sum" allowed => "sum with coefficients" allowed => "sub with coefficients + filtering at the end" allowed => ... => neuron implementation allowed => ... – Vi. – 2014-09-01T12:19:06.120@JanDvorak, Yes, functions for processing numbers, unrelated to input/output. – Vi. – 2014-09-01T12:20:47.240
1What about J's vector operations? Are they disallowed? – John Dvorak – 2014-09-01T12:20:48.547
@JanDvorak, (Looking up what is
J's vector
... Can't find it easily). If it's a speedy processing of a fixed block of data (i.e. 32 floats array), it looks more or less OK. – Vi. – 2014-09-01T12:21:11.4301Operations in J, K and other APL derivatives have an implicit rank (dimension they operate on). If you apply them to a data structure with a larger dimension, they perform an implicit iteration.
+
can mean(+)
,zipWith (+)
,zipWith $ zipWith (+)
... – John Dvorak – 2014-09-01T12:22:56.763@JanDvorak, Consider that this "implicit iteration" is not a part of mathematical function, just a special zero-byte-consuming part of your program. The inner function being iterated takes const-sized input. – Vi. – 2014-09-01T12:25:17.200
zipWith
is not a mathematical function, it's list-processing function. And(+)
is OK. If there were a librarycozysum = zipWith (+)
, then this library whould be forbidden (because of bothzipWith
and(+)
is outsourced to the library). – Vi. – 2014-09-01T12:25:47.603I guess that stuff like J's
+/
(plus-over = fold with addition = sum) and*\
(times-table = table of multiplications) are fine, then?/
and\
are separate entities (adverbs = functions on functions). – John Dvorak – 2014-09-01T12:29:36.650@JanDvorak, "
/ and \
are separate entities" -> So it's OK. In general, the rule is to forbid big simplification like ready-made FFT, ready-made OCR, ready-made noise reduction, not some "elementary " language iteration features. Just a sum probably would not be that noticable (and such minor "violations" are probably easy fixable if needed). – Vi. – 2014-09-01T12:31:21.277How does a polymorphic
sum
(fold an arbitrary list with the+
operator, whatever it does) fare? – John Dvorak – 2014-09-01T12:34:58.420This polymorphic sum can be viewed as a
fold
by other name (hence a control function, not math one), unless the overridden+
can take an array of numbers or something like that. Even if+
processes arrays, if implementation of such+
is in your code then it's OK. If not, it can be not OK. You can post both "common sense" and "100% compliant" solutions if you want. A real problem is when they diverge too much. – Vi. – 2014-09-01T12:38:11.347@JanDvorak, BTW how do I run an APL program on my Linux Debian? It's not feasible to use some online runner because of there is a big blob of input data. – Vi. – 2014-09-01T12:40:15.457
I've never used APL, but I've used J. IIRC, there is a J runtime for linux. – John Dvorak – 2014-09-01T12:41:39.800
Can you train the program prior to submitting, then remove all training code? – Nic Robertson – 2014-09-01T22:26:04.390
@NicRobertson, Yes. But it's better include the training part as an addendum. – Vi. – 2014-09-02T00:20:28.157
Does all training set/examination set have the same sample rate? ie. 2 channels, 16 bits per sample , 44.1 khz sample rate. – Nic Robertson – 2014-09-02T03:51:27.823
@NicRobertson, All audio files (training and examination samples) have the same format (the one you mentioned), the one of
arecord -f cd > file.wav
. Training samples are just random subset taken off from examination samples. – Vi. – 2014-09-02T10:14:45.920