16
3
The previous neural net golfing challenges (this and that) inspired me to pose a new challenge:
The challenge
Find the smallest feedforward neural network such that, given any 4-dimensional input vector \$(a,b,c,d)\$ with integer entries in \$[-10,10]\$, the network outputs \$\textrm{sort}(a,b,c,d)\$ with a coordinate-wise error strictly smaller than \$0.5\$.
Admissibility
For this challenge, a feedforward neural network is defined as a composition of layers. A layer is a function \$L\colon\mathbf{R}^n\to\mathbf{R}^m\$ that is specified by a matrix \$A\in\mathbf{R}^{m\times n}\$ of weights, a vector \$b\in\mathbf{R}^m\$ of biases, and an activation function \$f\colon\mathbf{R}\to\mathbf{R}\$ that is applied coordinate-wise:
$$ L(x) := f(Ax+b), \qquad x\in\mathbf{R}^n. $$
Since activation functions can be tuned for any given task, we need to restrict the class of activation functions to keep this challenge interesting. The following activation functions are permitted:
Identity. \$f(t)=t\$
ReLU. \$f(t)=\operatorname{max}(t,0)\$
Softplus. \$f(t)=\ln(e^t+1)\$
Hyperbolic tangent. \$f(t)=\tanh(t)\$
Sigmoid. \$f(t)=\frac{e^t}{e^t+1}\$
Overall, an admissible neural net takes the form \$L_k\circ L_{k-1}\circ\cdots \circ L_2\circ L_1\$ for some \$k\$, where each layer \$L_i\$ is specified by weights \$A_i\$, biases \$b_i\$, and an activation function \$f_i\$ from the above list. For example, the following neural net is admissible (while it doesn't satisfy the performance goal of this challenge, it may be a useful gadget):
$$\left[\begin{array}{c}\min(a,b)\\\max(a,b)\end{array}\right]=\left[\begin{array}{rrrr}1&-1&-\frac{1}{2}&-\frac{1}{2}\\1&-1&\frac{1}{2}&\frac{1}{2}\end{array}\right]\mathrm{ReLU}\left[\begin{array}{rr}\frac{1}{2}&\frac{1}{2}\\-\frac{1}{2}&-\frac{1}{2}\\1&-1\\-1&1\end{array}\right]\left[\begin{array}{c}a\\b\end{array}\right]$$
This example exhibits two layers. Both layers have zero bias. The first layer uses ReLU activation, while the second uses identity activation.
Scoring
Your score is the total number of nonzero weights and biases.
(E.g., the above example has a score of 16 since the bias vectors are zero.)
2@Close-voter: What exactly is unclear? I don't think either of the previous NN challenges was so well specified. – flawr – 2019-09-27T14:44:32.247
Are skip connections (NOT feedback connections) allowed? – Joel – 2019-09-27T15:51:27.187
1No - skip connections are not allowed. – Dustin G. Mixon – 2019-09-27T16:13:02.627
@flawr It's probably just because of the high level of knowledge and math that is required. I don't have a specific idea of what the question is asking, so I am not qualified to know whether it is unclear or not. – mbomb007 – 2019-09-27T21:44:06.673
@DustinG.Mixon Out of curiosity: Did you construct the max/min network yourself or did you find it somewhere else? – flawr – 2019-09-27T22:13:22.563
@flawr - I constructed it myself. It may not be minimal. – Dustin G. Mixon – 2019-09-27T22:23:17.453
1@DustinG.Mixon I actually just found an approach for the max/min that only uses 15 weights instead of 16, but it is considerably less elegant:) – flawr – 2019-09-27T23:27:24.523
3This is a nicely specified challenge that I think could serve as a model for future neural-network challenges. – xnor – 2019-09-27T23:30:01.277
1I personally find it difficult to optimize without skipping connections. This is because a sorting NN is required to output numbers close enough to the inputs. So it seems necessary to 'remember' / 'reconstruct' the inputs across layers. I do not see how that could be done easily once $e^t$ is involved since there are no inverses of those functions allowed as activations. So we are only left with the ReLUs for which the baseline (with minor improvements as shown in flawr's answer) is already near optimal. – Joel – 2019-09-28T00:16:07.493
@Joel I agree, more flexibility in the architecture would have made this a more interesting challenge. I guess the drawback with allowing e.g. skip connections is that the specs would be a lot more complicated. – flawr – 2019-09-28T12:43:57.260
Another alternative is to allow outputting the indices of sorted inputs (i.e.
argsort
) rather than the actual sorted inputs. That would also avoid the need of reconstructing the inputs entirely. – Joel – 2019-09-28T13:04:57.160