26
8
Background
Recognizing primality seems like a poor fit for (artificial) neural networks. However, the universal approximation theorem states that neural networks can approximate any continuous function, so in particular it should be possible to represent any finitely-supported function one desires. So let's try to recognize all the primes among the first million numbers.
More precisely, because this is a programming website, let's go up to 2^20 = 1,048,576. The number of primes below this threshold is 82,025 or roughly 8%.
Challenge
How small of a neural network can you find that correctly classifies all 20-bit integers as prime or not prime?
For the purposes of this challenge, the size of a neural network is the total number of weights and biases required to represent it.
Details
The goal is to minimize the size of a single, explicit neural network.
The input to your network will be a vector of length 20 containing the individual bits of an integer, represented either with 0s and 1s or alternatively with -1s and +1s. The ordering of these can be most-significant-bit first or least-significant-bit first.
The output of your network should be a single number, such that above some cutoff the input is recognized as prime and below the same cutoff the input is recognized as not prime. For example, positive might mean prime (and negative not prime), or alternatively greater than 0.5 might mean prime (and less than 0.5 not prime).
The network must be 100% accurate on all 2^20 = 1,048,576 possible inputs. As mentioned above, note that there are 82,025 primes in this range. (It follows that always outputting "not prime" would be 92% accurate.)
In terms of standard neural network terminology, this would likely be called overfitting. In other words, your goal is to perfectly overfit the primes. Other words one might use are that the "training set" and the "test set" are the same.
This challenge does not consider the number of "trainable" or "learnable" parameters. Indeed, your network is likely to contain hard-coded weights, and the example below is entirely hard-coded. Instead, all weights and biases are considered parameters and are counted.
The length of the code necessary to train or generate your neural network is not relevant to your score, but posting the relevant code is certainly appreciated.
Baseline
As a baseline, it is possible to "memorize" all 82,025 primes with 1,804,551 total weights and biases.
Note that this code that follows includes many things: a working example, working test code, a working definition of neural network using a known neural network library, a "hard-coded" (or at least, not "trained") neural network, and a working measurement of score.
import numpy as np
bits = 20
from keras.models import Sequential
from keras.layers import Dense
from sympy import isprime
# Hardcode some weights
weights = []
biases = []
for n in xrange(1<<bits):
if not isprime(n):
continue
bit_list = [(n / (1 << i))%2 for i in xrange(bits)]
weight = [2*bit - 1 for bit in bit_list]
bias = - (sum(bit_list) - 1)
weights.append(weight)
biases .append(bias)
nprimes = len(biases)
weights1 = np.transpose(np.array(weights))
biases1 = np.array(biases )
weights2 = np.full( (nprimes,1), 1 )
biases2 = np.array( [0] )
model = Sequential()
model.add(Dense(units=nprimes, activation='relu', input_dim=bits, weights=[weights1, biases1]))
model.add(Dense(units=1, activation='relu', weights=[weights2, biases2]))
print "Total weights and biases: {}".format( np.size(weights1) + np.size(weights2) + np.size(biases1) + np.size(biases2) )
# Evaluate performance
x = []
y = []
for n in xrange(1<<bits):
row = [(n / (1 << i))%2 for i in xrange(bits)]
x.append( row )
col = 0
if isprime(n):
col = 1
y.append( col )
x = np.array(x)
y = np.array(y)
model.compile(loss='binary_crossentropy', optimizer='sgd', metrics=['accuracy'])
loss, accuracy = model.evaluate(x, y, batch_size=256)
if accuracy == 1.0:
print "Perfect fit."
else:
print "Made at least one mistake."
What is a neural network?
For the purposes of this challenge, we can write down a narrow but precise definition of an (artificial) neural network. For some external reading, I suggest Wikipedia on artificial neural network, feedforward neural network, multilayer perceptron, and activation function.
A feedforward neural network is a collection of layers of neurons. The number of neurons per layer varies, with 20 neurons in the input layer, some number of neurons in one or more hidden layers, and 1 neuron in the output layer. (There must be at least one hidden layer because primes and not-primes are not linearly separable according to their bit patterns.) In the baseline example above, the sizes of the layers are [20, 82025, 1].
The values of the input neurons are determined by the input. As described above, this will either be 0s and 1s corresponding to the bits of a number between 0 and 2^20, or -1s and +1s similarly.
The values of the neurons of each following layer, including the output layer, are determined from the layer beforehand. First a linear function is applied, in a fully-connected or dense fashion. One method of representing such a function is using a weights matrix. For example, the transitions between the first two layers of the baseline can be represented with a 82025 x 20 matrix. The number of weights is the number of entries in this matrix, eg 1640500. Then each entry has a (separate) bias term added. This can be represented by a vector, eg a 82025 x 1 matrix in our case. The number of biases is the number of entries, eg 82025. (Note that the weights and biases together describe an affine linear function.)
A weight or bias is counted even if it is zero. For the purposes of this narrow definition, biases count as weights even if they are all zero. Note that in the baseline example, only two distinct weights (+1 and -1) are used (and only slightly more distinct biases); nonetheless, the size is more than a million, because the repetition does not help with the score in any way.
Finally, a nonlinear function called the activation function is applied entry-wise to the result of this affine linear function. For the purposes of this narrow definition, the allowed activation functions are ReLU, tanh, and sigmoid. The entire layer must use the same activation function.
In the baseline example, the number of weights is 20 * 82025 + 82025 * 1 = 1722525 and the number of biases is 82025 + 1 = 82026, for a total score of 1722525 + 82026 = 1804551. As a symbolic example, if there were one more layer and the layer sizes were instead [20, a, b, 1], then the number of weights would be 20 * a + a * b + b * 1 and the number of biases would be a + b + 1.
This definition of neural network is well-supported by many frameworks, including Keras, scikit-learn, and Tensorflow. Keras is used in the baseline example above, with code essentially as follows:
from keras.models import Sequential
model = Sequential()
from keras.layers import Dense
model.add(Dense(units=82025, activation='relu', input_dim=20, weights=[weights1, biases1]))
model.add(Dense(units=1, activation='relu', weights=[weights2, biases2]))
score = numpy.size(weights1) + numpy.size(biases1) + numpy.size(weights2) + numpy.size(biases2)
If the weights and bias matrices are numpy arrays, then numpy.size will directly tell you the number of entries.
Are there other kinds of neural networks?
If you want a single, precise definition of neural network and score for the purposes of this challenge, then please use the definition in the previous section. If you think that "any function" looked at the right way is a neural network with no parameters, then please use the definition in the previous section.
If you are a more free spirit, then I encourage you to explore further. Perhaps your answer will not count toward the narrow challenge, but maybe you'll have more fun. Some other ideas you may try include more exotic activation functions, recurrent neural networks (reading one bit at a time), convolutional neural networks, more exotic architectures, softmax, and LSTMs (!). You may use any standard activation function and any standard architecture. A liberal definition of "standard" neural network features could include anything posted on the arxiv prior to the posting of this question.
What sort of types are these weights? Usually people use floats can we use other numeric types? e.g. types of less, more or unlimited precision. – Post Rock Garf Hunter – 2019-04-12T20:59:54.667
@SriotchilismO'Zaic: For the purposes of the narrow definition, I think it makes sense to restrict to float and double (IEEE single- and double-precision floating point real numbers) for all of the weights and intermediate values. (Although note that some implementations may use other amounts of precision -- eg 80-bit -- during evaluation.) – A. Rex – 2019-04-13T15:53:47.970
I love this question but am disappointed there isn't a much smaller neural network to be find with enough training time. – Anush – 2019-05-12T15:37:14.500