Java (5 or later), 15 characters
02367?\abcdeitu
We've had a few previous answers in Java. The basic approach in all of them is to a) identify a minimal Turing-complete subset of the language, and b) find a minimal way to express the constructs of that language in Java's syntax.
Hexagraph notation
Let's look at b) first. As explained in @Poke's answer above, Java has a hexagraph syntax (analogous to C's trigraph syntax) to allow arbitrary characters that might not exist in your character set to be included in your program. For example, a newline could be written as a literal newline, but it could also be written as the hexagraph \u000a
; the hexagraph consists of \u
followed by four hexadecimal digits, specifying the character's Unicode codepoint. Unlike C's trigraphs, which can only be used for a few awkward characters, a Java hexagraph can be used for absolutely any Basic Multilingual Plane character we might happen to need (including printable ASCII characters).
The previous records, 17 by @Poke, and 16 by @Poke and me, were based on taking relatively normal-looking Java programs and simply trying to hexagraph every character in them: your character set is then based on which nybbles occur in the codepoints you're using. If a nybble occurs in two different codepoints, it typically saves character set entries to include that nybble in the character set, so you can construct the codepoint with it. One minor improvement for this entry is that if a nybble only occurs in a single codepoint, we may as well include that codepoint in our character set directly: the resulting programs will end up slightly shorter.
Of the 16 nybbles, this entry manages to omit 2 entirely: 5
, and 8
. 4
, 9
, and f
are also omitted; each is only needed to write a single character (t
= U+0074, i
= U+0069, and ?
= U+003F respectively), and including that character directly leads to shorter and "more readable" programs. One final saving is available from a
/1
: we don't need a
as a nybble to write any character, we do need to be able to produce U+0061, a
's codepoint, but we don't need 1
for any other codepoint. So a
and 1
are redundant with each other: we need at least one of them, but don't need both; and omitting a
/1
, 5
, and 8
from our base set of 18 characters \u0123456789abcdef
gives us our final character set.
Of course, this means that we have to avoid many more missing characters than in the other entries. In particular, we can no longer create the boilerplate for a main
method (which must have a parameter of type String
; S
= U+0053 contains the forbidden nybble 5
). So we're going to need a radically different way of running a Java program.
Java as an interpreter
Java is normally a compiled language; a typical workflow is to use the compiler javac
to compile your Java source files into one or more .class
files, and then the JVM java
to run those source files, and take the output of java
as the output of the program. None of your code actually runs at compile time, so the output of javac
is typically regarded as uninteresting.
Nonetheless, javac
does contain some nontrivial functionality; Java is, after all, a fairly complex language. We can take a single Boolean of output from the compiler javac
by checking to see if there are compile errors, looking at its exit code: if the program has errors, that will produce a different output from if the program doesn't have errors. Of course, Java being a compiled language, an erroring program might not seem particularly useful from a Turing-completeness point of view: if it has errors, it won't actually run, so how could it be Turing-complete? However, it turns out that type-checking Java programs is in of itself a Turing-complete operation; all we need to be able to do is to be able to compile programs from some Turing-complete language into Java's type system, in such a way that, in order to determine whether the resulting program is valid Java or not, the type-checker will have no choice but to run the program we compiled.
The Subtyping Machine
The Subtyping Machine is an esoteric programming language that was "back-derived" (by Radu Grigore in 2017) from Java's type system, looking at the algorithm that Java actually uses to determine whether an expression has the correct type or not. Here's an example I wrote of what this sort of program looks like:
interface xx {}
interface A<x> {}
interface B<x> {}
interface d<x> extends
A<s<? super X<? super B<? super s<? super X<? super d<x>>>>>>>,
B<x>, xx {}
interface X<x> extends xx {}
interface s<x> extends xx {}
class x {
d<? super X<? super d<? super X<? super d<? super X<? super s<xx>>>>>>> xc;
A<? super s<? super X<? super d<xx>>>> xd = xc;
}
The bulk of the program is basically just a lot of interfaces extending each other with contravariant generic type parameters. If you have A<x> extends B<…<? super x>>
, and you're trying to see whether an expression starting A<…>
can be stored in a variable of type B<…>
, what ends up happening is that the first type ends up getting potentially much more complicated as the generic parameter is expanded, and then the resulted B<…>
wrappers cancel, but (because the types are contravariant) the two parameters basically swap roles within the type-checking algorithm. The result is a type-checking problem that could potentially be more complex than the problem we started with; the operation effectively boils down to popping two stacks and then pushing onto one of them based on the value popped. You have to push onto the two stacks alternately, but that isn't a major issue, so we effectively end up with a two-stack machine, and two stacks are enough for Turing-completeness. Full details of how the language operates are in the link at the start of this section.
Minimizing the character set of The Subtyping Machine
Now that we have a Turing-complete operation that can be run by java
, thus avoiding the need for the public static void main(String[] a)
boilerplate that's required for the benefit of javac
(but not java
), the final step is to reduce its character set as far as possible.
There are some characters that are absolutely necessary. To use this technique, we need to be able to declare interfaces (interface … {}
) and contravariant type parameters (<? super …>
), which already ties up many of our nybbles. The main problem I encountered in this solution was in trying to avoid the nybble 8
, most notably used by (
= U+0028 and x
= U+0078. (1
/a
end up not being used for anything important, e.g. they're merged in @Poke's answer just as they are here; and 5
turns out to be used only for e
=U+0065 and u
=U+0075, but fortunately both those characters are needed for other reasons, e
as nybble and u
because it's part of the \u
hexagraph introducer, so we never need to write them as hexagraphs. Unlike in the previous record-holder, c
is unavoidable because we need it for <
=U+003c, pretty much unavoidable for any type-system-based approach.)
Avoiding parentheses is a little annoying, but not that hard; in fact, I did it in the example program above. The reason they'd be helpful is that once we declare a bunch of interfaces extending each other, we actually need to cause the type checker to type-check something; Radu Grigore's original program did this by defining a function. The approach I used above works by defining two variables and assigning one to the other, which will also force the type-checker to be involved; fortunately, neither =
=U+003d nor ;
=U+003b uses a forbidden nybble.
Avoiding x
is harder; despite being pretty rare as letters go, it's needed to spell extends
, the keyword Java normally uses to introduce a subtyping relationship. That might at first seem impossible to avoid, but we do have an alternative; when a class extends an interface, Java uses the keyword implements
instead, which despite being longer doesn't contain any problematic characters. So as long as we can divide our program into classes and interfaces, with the only hardcoded subtyping relationships being between a class and an interface, we can make the program work. (We also have to use the class
keyword, but that contains only letters we already have: interface implements
.) There are several possible approaches that work, but one simple approach is to ensure that classes and interfaces always alternate within the two types being compared; that means that at any point in time, we're always comparing a class with an interface (and because we unwrap both stacks one level at a time and the direction of a comparison is reversed with every step, we're always checking to see whether a class implements an interface rather than the other way round, which is impossible in Java).
My compiler from The Subtyping Machine to compile-time Java proves the Turing-completeness of this 15-character subset of Java by being capable of compiling to it; use the -jj
option and it'll output in this subset, rather than in more readable Java (by doing things like choosing a class
/interface
split that avoids the use of the extends
keyword – in a marginally more sophisticated way than is described above – and changing variable names to only use the letters that exist in the character set, and of course hexagraphing any character that needs to be).
I was hoping to produce a more complex example, but I've spent enough time on this as it is, so I thought I may as well post it. After all, it shaves one character off the best known character set via ridiculous means, and isn't that what this question is all about?
92Unary, 1 character. sighs – Dennis – 2017-02-20T15:24:06.187
4@Dennis It's not that different from Jelly or 05AB1E having a built-in for an interesting number theory problem. This challenge still seems like an interesting and non-trivial optimisation problem in any language that wasn't designed to be a tarpit. – Martin Ender – 2017-02-20T15:35:32.950
7@MartinEnder I'd be especially interested to see answers in languages like Java or C. – Julian Lachniet – 2017-02-20T15:41:59.193
@MartinEnder Point taken. Might as well post the answer before someone else does then. – Dennis – 2017-02-20T15:42:21.073
I am pretty sure checking that an answer is valid in most languages will be difficult (excluding turing tarpits) – Fatalize – 2017-02-20T15:44:28.303
@Fatalize One way to check if it's Turing complete is to emulate Brainfuck, which is very simple. – Julian Lachniet – 2017-02-20T15:50:38.760
3@Fatalize I would expect answers to include at least a sketched proof which could be either a reduction from a known-to-be-TC language or implementation of a known-to-be-TC language with the reduced character set. – Martin Ender – 2017-02-20T15:58:03.857
1Related. (The challenges are very similar, but that one requires write an actual translator from an arbitrary program of the given language into a program using only the reduced character set, which can be a lot more difficult than simply proving that the reduced character set is Turing-complete. In fact it's possible to find Turing-complete subsets that cannot represent every program of the host language.) – Martin Ender – 2017-02-20T16:16:00.077
9Please don't post solutions in esolangs where the solution is every valid character in the language. It's not intresting or clever. – Pavel – 2017-02-20T17:20:58.863
I'm confused as to what turing complete means, read a couple posts but it's very vague and the examples are quite vague as well. Anyone care on giving a very simple explanation? thanks (: – kemicofa ghost – 2017-02-20T21:15:18.347
22@Pavel Not interesting or clever may mean that it shouldn't get upvoted, but certainly not that it shouldn't get posted. – Dennis – 2017-02-20T21:36:28.167
@rugdealer the idea is that you can implement any computable function from integers to integers (or strings to strings) in this subset of characters. – Paŭlo Ebermann – 2017-02-20T21:48:54.500
@JulianLachniet: C probably isn't Turing-complete (although it depends on the implementation; it's possible to implement the file API in such a way that C becomes Turing-complete, but most implementations don't actually implement it like that). Thus, it isn't qualified for this challenge. – None – 2017-02-20T23:19:45.263
To me, the top comment of "Unary" was actually by far the most interesting thing to come of this question. There's a discussion somewhere about implementing a quine in brainfuck, converting the symbols to binary, then encoding the resulting number in unary. The number was something like 5*10^10000. – Darren Ringer – 2017-02-21T14:17:39.193
@ais523 C can be Turing-complete without file systems, but you have to use some pathological definitions of a "byte" to do so. You can play the power-set game to declare the byte to have a finite length, but the number of possible values of that byte is countably infinite. It does get a bit hard to define
std::numeric_limits<int>::max()
that way however. You have to play some games there too (like having max() never complete). – Cort Ammon – 2017-02-23T21:38:32.820@CortAmmon: I think you're confusing C with C++. In C, there's a potential issue with
CHAR_BIT
, which is a compile-time constant, but I guess it could be an infinitely large integer (which fits into one infinitely large byte)? – None – 2017-02-24T01:18:49.537@ais523 Yeah, it's a bit awkward, but I think the only real question would be whether you need a constructive proof that the language works or not. I'm not sure if you could find a way to construct
CHAR_BIT
, though you could prove that it was a natural number and thus fits into a single char compile-time constant. – Cort Ammon – 2017-02-24T02:30:00.393Is it enough to specify a Turing-complete function, or does it have to be a complete program? – Nathaniel – 2017-02-25T01:03:38.173
1@Nathaniel Whichever makes most sense in context. – Julian Lachniet – 2017-02-25T03:52:33.750
@JulianLachniet The accepted answer is the answer that wins the challenge. If you must, you can simply not accept any answer at all, but you shouldn't accept an arbitrary one.
– Dennis – 2017-06-06T03:52:41.320@Dennis Since their is no "winner," I accepted the one with the most votes. – Julian Lachniet – 2017-06-06T10:42:19.700
Unless I misunderstood your challenge, the score of a submission is the amount of unique characters, lower being better. That makes Unary the formal winner, as boring as the answer is. You can either accept the answer or accept none of them. What you cannot do is accept an arbitrary answer, as it violates the meta consensus and basically means your challenge no longer has an objective winning criterion (thus making it off topic). – Dennis – 2017-06-06T13:19:14.547
1Note: The Brainfuck answer does not actually require wrapping cells (look up Wang B machines) – CalculatorFeline – 2017-06-18T17:48:29.707
@CalculatorFeline See also Brainfuck minus - on the esolang wiki.
– Ørjan Johansen – 2019-02-08T01:20:38.207the day of combinatory logic – Mega Man – 2019-03-25T15:49:22.143