69
15
I'd like to propose a different kind of golfing challenge to this community:
(Artificial) Neural Networks are very popular machine learning models that can be designed and trained to approximate any given (usually unknown) function. They're often used to solve highly complex problems that we don't know how to solve algorithmically like speech recognition, certain kinds of image classifications, various tasks in autonomous driving systems, ... For a primer on neural networks, consider this excellent Wikipedia article.
As this is the first in what I hope to be a series of machine learning golf challenges, I'd like to keep things as simple as possible:
In the language and framework of your choice, design and train a neural network that, given \$(x_1, x_2)\$ calculates their product \$x_1 \cdot x_2\$ for all integers \$x_1, x_2\$ between (and including) \$-10\$ and \$10\$.
Performance Goal
To qualify, your model may not deviate by more than \$0.5\$ from the correct result on any of those entries.
Rules
Your model
- must be a 'traditional' neural network (a node's value is calculated as a weighted linear combination of some of the nodes in a previous layer followed by an activation function),
- may only use the following standard activation functions:
- \$\textrm{linear}(x) = x\$,
- \$\textrm{softmax}(\vec{x})_i = \frac{e^{x_i}}{\sum_j e^{x_j}}\$,
- \$\textrm{selu}_{\alpha, \beta}(x) = \begin{cases} \beta \cdot x & \text{, if } x > 0 \\ \alpha \cdot \beta (e^x -1 ) & \text{, otherwise} \end{cases}\$,
- \$\textrm{softplus}(x) = \ln(e^x+1)\$,
- \$\textrm{leaky-relu}_\alpha(x) = \begin{cases} x & \text{, if } x < 0 \\ \alpha \cdot x & \text{, otherwise} \end{cases}\$,
- \$\tanh(x)\$,
- \$\textrm{sigmoid}(x) = \frac{e^x}{e^x+1}\$,
- \$\textrm{hard-sigmoid}(x) = \begin{cases} 0 & \text{, if } x < -2.5 \\ 1 & \text{, if } x > 2.5 \\ 0.2 \cdot x + 0.5 & \text{, otherwise} \end{cases}\$,
- \$e^x\$
- must take \$(x_1, x_2)\$ either as a tupel/vector/list/... of integers or floats as its only input,
- return the answer as an integer, float (or a suitable container, e.g. a vector or list, that contains this answer).
Your answer must include (or link to) all code necessary to check your results -- including the trained weights of your model.
Scoring
The neural network with the smallest number of weights (including bias weights) wins.
Enjoy!
1My personal best, which I'll submit later, uses 43 weights and I'm sure there's plenty of room for improvements. – Stefan Mesken – 2019-07-02T11:51:59.410
9Welcome to the site! I think this challenge could benefit a good deal from a more robust definition of a neural network. There are a couple of things here 1) It would be very nice for you to state it in language that does not already imply knowledge of NNs 2) You really should list the activation functions in your post rather than link out to an external source (outside links can change or disappear). – Post Rock Garf Hunter – 2019-07-02T12:29:46.570
2@SriotchilismO'Zaic I will add the activation functions but I don't think I want to include a definition of a neural network. I'm confident that my post contains enough information for people who are familiar with neural networks and for people who don't, this isn't the right place to learn about them. – Stefan Mesken – 2019-07-02T12:33:14.793
2Can we have constant offsets? e.g., for a node with a single input
x
, compute its value as ax+b? And if yes, does that count as 1 or 2 weights? – Grimmy – 2019-07-02T14:30:46.2701@Grimy I'd be fine with that. However, they must be counted as two weights as otherwise this would allow for some shenanigans (e.g. in my code, it would reduce the weights to 33). – Stefan Mesken – 2019-07-02T14:40:03.957
4Can we reuse weights/use convolutional layers? (I recommend removing the bonus, as it doesn't add anything to the challenge and just distracts from the main goal.) Are the weights supposed to be real or can they be complex? – flawr – 2019-07-02T14:41:42.513
4Your wording implies nodes from layer 3 can't use inputs from layer 1. Does it cost a weight to have a layer 2 node simply doing
f(x) = x
to forward its input? – Grimmy – 2019-07-02T14:46:26.9231@flawr CNNs are allowed. Basically any forward propagation that is a chain of 'weighted sum of previous nodes followed by activation function' is permissible. In particular, you may also include RNN layers. The rules are meant to encourage experimentation while protecting this challenge from becoming trivialized (e.g. using a neural network that solves this problem as activation function or data preprocessing that simplifies the task). – Stefan Mesken – 2019-07-02T14:47:16.887
3@Grimy You're correct. My wording doesn't correctly reflect my intentions. I'll edit that. Skipping layers is allowed and if you do so with one of the listed activation functions, it doesn't cost you any weights. – Stefan Mesken – 2019-07-02T14:48:40.040
How many weights does a mapping like $x \mapsto (x,x,x)$ count as? – flawr – 2019-07-02T15:02:46.410
1@flawr That's zero weights. You may reroute/duplicate your states as you wish. So, you may also have a map like $(x,y) \mapsto (x,y,y,x,y)$ for free. (I sure hope this doesn't break the game in some way I don't immediately foresee ;-)) – Stefan Mesken – 2019-07-02T15:08:31.293
1Should we counts weights that are equal to 1? For example, would a node doing
output = tanh(input)
(or some other standard activation function) count as 0 or 1 weight? – Grimmy – 2019-07-02T16:10:24.920@mbomb007 Sure, I'll add that. – Stefan Mesken – 2019-07-03T02:37:44.990
Like Grimy, I'm also unclear whether a weight that's equal to 1 counts as a weight for scoring. How much does it cost to do
z = x + y
to get the sum of the two inputs? – xnor – 2019-07-03T03:14:35.430@xnor That should count as 2 weights as this is equivalent to a dense layer with 2 inputs, one output without bias. Admittedly, this is awkward and certainly shows there a problems with the current set of rules. However, two points: 1. This is my first attempt at coming up with a machine learning golf challenge, so these kinds of problems are to be expected. 2. Because I've deliberately chosen a problem that has a trivial algorithmic solution, these quirks of the rules, unfortunately, are more problematic than they would be in an actual machine learning problem. Sorry about that! – Stefan Mesken – 2019-07-03T03:33:15.020
2If people are interested in machine learning golf (which I really hope they are), I'm confident we'll flash out these issues in no time and end up with a semi-standard set of rules similar to those for more traditional golfing challenges. – Stefan Mesken – 2019-07-03T03:37:40.550
If we are reusing the same weights, do we have to count them each time or can we just count them once? – flawr – 2019-07-03T06:24:03.753
1@flawr I've thought about that and I think reusing them is fine. It will have a significant impact on this particular challenge but it overall better captures what I intend to measure (model complexity) and it will allow for more creative solutions in this and future challenges. So, yes: Feel free to reuse your weights to improve your score! – Stefan Mesken – 2019-07-03T06:33:04.513
Is there any particular reason you don't have
y=ln(x)
in your list of activation functions? – FirefoxMetzger – 2019-07-03T13:13:57.383@FirefoxMetzger Yes, there is. And I'm glad you asked: $a \cdot b = e^{\ln(a \cdot b)} = e^{\ln(a) + \ln(b)}$ – Stefan Mesken – 2019-07-03T14:15:00.067
@StefanMesken I am aware that this would become a solution (in this case 1 neuron would be the anwer). I'm curious about the logic behind making
exp(.)
part of the 'standard activation functions', but not it's inverse. – FirefoxMetzger – 2019-07-03T14:18:52.787@FirefoxMetzger $\exp$ is a total function which makes it more suitable as an activation function. Since I wanted to pick one and only one, $\exp$ seemed like the better choice. – Stefan Mesken – 2019-07-03T14:19:56.367
2
IMO this fails to reach the required level of clarity on at least two counts: firstly, it's very vague what the criteria are for an answer to be valid (contrast https://codegolf.stackexchange.com/q/183036/194 , which took a lot of work to polish up to the required standard); and secondly, it's too vague about the scoring (see the same earlier question as an example of how to be clear enough).
– Peter Taylor – 2019-07-03T16:08:08.8901@PeterTaylor I'm not sure that I fully agree with that criticism but I'm certainly willing to improve this post. In fact, I think these first attempts should be viewed as a trial run that guide the development of a better setup. (I don't imagine that the first challenges posted on code golf se were up to the current standards either and I'm making them up for ML golf as we speak. However, doing so without participants that bring up issues seems doomed which is why I plead for keeping this post open.) – Stefan Mesken – 2019-07-03T16:11:25.040
4There should be a link in the right column to the Sandbox, which was created expressly to fix these kind of issues before the question is even posted to the main site. And the network philosophy is that it's better to close a question, fix it, and reopen it than to get a bunch of answers which either will make no sense after the question is fixed or will tightly constrain the changes which can be made to the question. – Peter Taylor – 2019-07-03T16:22:30.780
1@PeterTaylor 'These kinds of issues' are detected by participation -- not by looking at some preliminary version in a vacuum. Before I ever considered posting this challenge, I've spent a week worth of afternoons doing trial runs and coming up with ways to cheat the system and how to defend against that. We are walking a thin line here: Creating rules restrictive enough to make the challenge meaningful while allowing as much freedom as possible to make it accessible to a wide audience. Strongly disagree with that last comment. – Stefan Mesken – 2019-07-03T16:25:34.190
7Not at all. These kinds of issues are detected by many years' experience of seeing other people make the same kind of mistakes. Some ambiguities slip past the sandbox, but many more are caught there. And this would definitely have been caught, because as indicated in my first comment we had exactly the same problems with a neural net question two months ago. – Peter Taylor – 2019-07-03T16:29:35.080
1Typo in the hard sigmoid function: should be 1 if x > 2.5, not -2.5. – Ray – 2019-07-04T02:36:56.687
Am I allowed to use gating (like in LSTM) and self-gating? – CrabMan – 2019-07-04T21:16:12.973
1@CrabMan Only if you can realize them as a combination of weighted sums followed by one of the allowed activation functions. (Doing is naively would require a lot of weights... But maybe you can do it efficiently.) – Stefan Mesken – 2019-07-05T04:33:40.437
From review queue, this question is not unclear. Just because the math and concepts are complicated and you don't understand them (I don't either) doesn't mean the question is not specific or clear. – mbomb007 – 2019-07-09T13:38:20.720
@StefanMesken: Thank you for asking this question and resisting defining "neural network", as well as disagreeing with other parts of the "network philosophy". I asked the first neural network question a couple months before you: https://codegolf.stackexchange.com/q/183036/ . It received a lot of pushback from the community and was in fact closed. It was mentioned above that it took "a lot of work to polish up to the required standard", but in fact I believe that work I did was worthless. Good luck with further questions.
– A. Rex – 2019-09-24T19:17:07.0601If linear activations are legal, then you can reduce every solution (up to machine precision) to another solution that has only two distinct weights. Indeed, every neural net whose weights are integer multiples of 10^(-2k) can be simulated with a much larger neural net whose weights are plus or minus 10^(-k). To see this, observe that multiplication by any such weight w can be implemented as w.x = sgn(w).v^T.v.x, where v is the column vector consisting of 10^(2k).abs(w) entries, each entry equaling 10^(-k). – Dustin G. Mixon – 2019-09-24T23:35:13.443