11
5
The challenge
Find the smallest feedforward neural network such that, given any 3-dimensional input vector \$(a,b,c)\$ with integer entries in \$[-10,10]\$, the network outputs the largest (i.e., "most positive") root of the polynomial \$x^3+ax^2+bx+c\$ with error strictly smaller than \$0.1\$.
Admissibility
The notion of admissibility in my previous neural net golfing challenge seemed a bit restrictive, so for this challenge, we are using a more liberal definition of feedforward neural network:
A neuron is a function \$\nu\colon\mathbf{R}^n\to\mathbf{R}\$ that is specified by a vector \$w\in\mathbf{R}^{n}\$ of weights, a bias \$b\in\mathbf{R}\$, and an activation function \$f\colon\mathbf{R}\to\mathbf{R}\$ in the following way:
$$ \nu(x) := f(w^\top x+b), \qquad x\in\mathbf{R}^n. $$
A feedforward neural network with input nodes \$\{1,\ldots,n\}\$ is a function of \$(x_1,\ldots,x_n)\in\mathbf{R}^n\$ that can be built from a sequence \$(\nu_k)_{k=n+1}^N\$ of neurons, where each \$\nu_k\colon\mathbf{R}^{k-1}\to\mathbf{R}\$ takes inputs from \$(x_1,\ldots,x_{k-1})\$ and outputs a scalar \$x_k\$. Given some specified set \$S\subseteq\{1,\ldots,N\}\$ of output nodes, then the output of the neural network is the vector \$(x_k)_{k\in S}\$.
Since activation functions can be tuned for any given task, we need to restrict the class of activation functions to keep this challenge interesting. The following activation functions are permitted:
Identity. \$f(t)=t\$
ReLU. \$f(t)=\operatorname{max}(t,0)\$
SoftPlus. \$f(t)=\ln(e^t+1)\$
Sigmoid. \$f(t)=\frac{e^t}{e^t+1}\$
Sinusoid. \$f(t)=\sin t\$
Overall, an admissible neural net is specified by input nodes, a sequence of neurons, and output nodes, while each neuron is specified by a vector of weights, a bias, and an activation function from the above list. For example, the following neural net is admissible, though it does not meet the performance goal of this challenge:
Input nodes: \$\{1,2\}\$
Neurons: \$\nu_k(x_1,\ldots,x_{k-1}):=x_{k-2}+x_{k-1}\$ for \$k\in\{3,\ldots,10\}\$
Output nodes: \$\{5,9,10\}\$
This network consists of 8 neurons, each with zero bias and identity activation. In words, this network computes the generalized Fibonacci sequence generated by \$x_1\$ and \$x_2\$ and then outputs the 5th, 9th, and 10th numbers from this sequence, in that order.
Scoring
Given a real number \$x\$ with terminating decimal expansion, let \$p(x)\$ be the smallest nonnegative integer \$p\$ for which \$10^{-p}\cdot |x|<1\$, and let \$q(x)\$ be the smallest nonnegative integer \$q\$ for which \$10^q \cdot x\$ is integer. Then we say \$p(x)+q(x)\$ is the precision of \$x\$.
For example, \$x=1.001\$ has a precision of \$4\$, whereas \$x=0\$ has a precision of \$0\$.
Your score is the sum of the precisions of the weights and biases in your neural network.
(E.g., the above example has a score of 16.)
Verification
While the roots can be expressed in terms of the cubic formula, the largest root is perhaps most easily accessed by numerical means. Following @xnor's suggestion, I computed the largest root for every choice of integers \$a,b,c\in[-10,10]\$, and the results can be found here. Each line of this text file is of the form a,b,c,root
. For example, the first line reports that the largest root of \$x^3-10x^2-10x-10\$ is approximately \$10.99247140445449\$.
Edit: The original file I posted had errors in cases where the polynomial exhibited a multiple root. The current version should be free of such errors.
3What happens in the input polynomial has no real roots, like when
a=0
and the quadratic has two complex roots? – xnor – 2019-09-29T21:27:03.247I think the cleanest solution would be the say that the input will have
a
be nonzero, or even just 1. Also, I'd recommend putting in some test cases, giving the roots to high precision so we can checks ours are within 0.1. It would also be good to have outputs for all possible inputs, probably in a link since that's a lot for the post. – xnor – 2019-09-29T21:32:29.583@xnor - Good eye, thanks! I'll work on some test cases. – Dustin G. Mixon – 2019-09-29T21:54:21.357
1I like the new admissibility rules. It looks like the new sinusoid function is extremely exploitable though. I have a sketchy proof that a function of the form
x -> a * sin(b * softplus(x) + c)
can overfit any finite number of data points with integerx
to arbitrary precision using an extremely large and precise frequency. – xnor – 2019-09-29T22:15:08.313@xnor - Thanks for spotting that. I expect sinusoid to be useful, but I agree that your exploit is extreme, so I updated the scoring mechanism to reflect precision. It seems more natural in hindsight. – Dustin G. Mixon – 2019-09-29T23:08:37.237
1
Not sure how useful it would be (for future challenges): In number theory we use height functions to measure the complexity of a number. For example the naive height of a (reduced) fraction $p/q$ is given by $h=\log\max{|p|,|q|}$ (and there are lot of generalizations). Maybe this could be used as an alternative measure.
– flawr – 2019-09-30T09:03:04.7771
@DustinG.Mixon I'm not sure if you're aware but we do have a sandbox for posting drafts and discussing details of a challenge a well as a chat.
– flawr – 2019-09-30T09:09:40.903@xnor He changed the coefficient of x^3 to constant 1 as to prevent this. BTW: this would only happen if a is zero and the discriminant of the quadratic is negative. – mondlos – 2019-09-30T16:28:11.640