I don't claim to have the ultimate solution to the problem (or that this list is exhaustive), but I want to outline some possible approaches that come to mind and why they would or wouldn't work. I also won't address tangential issues like whether using the current timestamp as a source of randomness is sufficiently "unpredictable" and how to enforce certain properties of the probability distribution — I'll just focus on avoiding solutions that use hardcoding.
Not a solution: disallow hard-coding explicitly
This is a bad idea. It's a non-observable requirement (which means you can't determine whether it's satisfied just by running the program), which is strongly discouraged on PPCG and may be outright impossible if running the program on any other platform, where submissions are verified in an automated manner. The problem with a requirement like this is that you'd have to start by finding an objective definition for "hard-coding". In general, if you try this, you're only going to make things worse.
Make hard-coding infeasible
If you can't disallow it completely, but you don't want people to use it, then you can try to design the challenge such that hard-coding is simply not a competitive approach. This is possible if the objects that should be generated are sufficiently large and incompressible that putting one example into the code would require a lot more bytes than writing an algorithm that generates valid solutions randomly. In your specific example, that's not the case of course, because identity matrices are valid and are generally easy to generate, but for other problems, that might not be the case. If the target objects are sufficiently irregular, just require them to be of a large size, which probably won't affect the byte count of an actual algorithm but will blow up the hard-coded part.
Parameterise the output
Often, these problems come with one or more natural parameter, like the size of the matrix in your example. If so, making that parameter an input can be sufficient to make hard-coding impossible or at least impractical. In some cases, it might be easy to hard-code one specific solution for a given parameter value that has been found manually or via extensive search, but maybe there is no simple closed form for an instance of these solution in general, so it's not possible to generate a default value for arbitrary inputs easily. Again, this is not the case for the example you mention, because the identity matrix works at any size, but an optimal solution for this related problem is usually highly irregular, so it's not possible to have default value without actively searching for valid values anyway. You can combine this with a time-limit to avoid a brute force search for a default value.
Put some restriction on the probability distribution
If you're willing to give up a completely unrestricted probability distribution, you can put some constraints on it, that still give answerers a lot of freedom in choosing their distribution but which make hard-coding a difficult or impossible:
- The simplest constraint that comes to mind is to require the difference between minimum and maximum probability for any possible output to be below a certain threshold. A hard-coded approach will likely have almost-zero probabilities for almost all outputs and a probability close to 1 for the default value. If you require the maximum difference to be below 0.1 say, there would need to be 10 (randomly chosen) default values to make the approach an option. Similarly you could also just require a minimum probability for each possible output, e.g. 1/(2*N*), where N is the number of possible outputs.
- Alternatively, you can require that there are no (likelihood) gaps in the distribution, so that there is no interval of size δ (chosen by you) such that both higher and lower probabilities exist. That means there can't be any outliers in terms of likelihood, which are likely generated by a hard-coding approach.
The main issue with these approaches is that they're a lot harder to reason about, proving correctness of answers is difficult, and experimentally verifying the correctness can be impossible for large output spaces. Still, they provide a principally observable requirement for the program which can make hardcoding impossible.
These approaches may also need a time limit, because one way to increase the probability of the non-default values would be to attempt finding a random foo multiple times before falling back to the default value.
We have a standard definition for random on meta that would ban hard-coding, but not restrict it so far as perfectly uniform.
– Geobits – 2016-12-15T14:22:13.5575
Great question. I've found in the past that specifying randomness is hard. Especially for the scenario you describe, there's also the issue that you can just generate random candidates and redo if they're invalid. That even gives you a uniform distribution, but non-deterministic runtime. When specifying uniform distribution there's also the issue that real solutions are never perfectly uniform. This is a very subtle issue. +1
– Martin Ender – 2016-12-15T14:29:15.557@MartinEnder Right, I forgot that approach. I can disallow it and the other slow algorithm with time bounds, but they still allow the "one hard-coded foo" solution. – Zgarb – 2016-12-15T14:37:15.347
seems like you could specify K3/K4 CPRNG, most languages will have a library https://en.wikipedia.org/wiki/Pseudorandom_number_generator
– Ewan – 2016-12-15T15:00:24.9671@Zgarb a big problem with disallowing "Generate and redo" is that the majority of language's RNG libraries do just that. – Nathan Merrill – 2016-12-15T15:09:19.110
A possible way to disallow "generate and redo" (rejection sampling) in certain challenges is to limit running time to something reasonable. For many problems rejection sampling is unfeasibly long because the target region is a small subset of the whole space – Luis Mendo – 2016-12-15T15:30:53.723
@NathanMerrill I don't want to explicitly disallow any technique. My goal is to implicitly disallow "dumb" rejection sampling that takes forever to run, and other boring solutions. Rejection sampling is sometimes fast and can be used as part of interesting algorithms. – Zgarb – 2016-12-15T18:15:54.673