The challenge

Simply write a program that outputs its own source code.

Nothing more than a regular quine.

The problem

We don't have a computer so we have to run the program on a programmable logic device (such as an FPGA, a CPLD, a gate array ...).

The rules

Any device available on the market (such as a printer connected via Centronics port, an LED display, an RS232 terminal ...) connected to the logic device can be used to output the program.
If you use any kind of programmable device as output device you are not allowed to put any program logic there!

Example: If you use RS232 to send data to a computer the computer must do nothing but displaying the data received from the RS232. However you are allowed to switch on RS232 options like echoing data back to the logic device if any existing terminal program has this feature.
Any (recent or historic) "standard" coding (ASCII, UNICODE, EBCDIC, Morse code ...) can be used.
The program needs only to output its own source code. The file only containing the mapping between VHDL/Verilog/... "wires" and the actual I/O pins, the file containing the compiler settings and similar files are not considered as "source code" and need not to be written.
You have one clock input pin with the frequency of your choice (if you need that).
On-chip units (like on-chip SRAM or on-chip multipliers) must not be used.
For testing the code you can simulate the output device using additional code; of course you may simulate the logic device, too (if you don't have a real one).
Standard loopholes apply.

The winner

For calculating the size of the program it is assumed that the real output device (e.g. the printer) is connected to some I/O pins of the logic device.
The code that requires the fewest "LE" cells on my FPGA (Altera EP2C20F484C7) wins.
If my FPGA is too small (= not large enough for the smallest solution) we compile for the largest one having "LE"-type cells (EP4CGX150DF31I7).
If that one is still not enough we try the largest one supported by the free-of-charge compiler (EP2AGX260FF35I5).
If that device is still too small the size of the source code counts.

Note

Searching for "quine VHDL" in Google I found at least three quines written in VHDL on the first page!

Unfortunately none of them will work on a real logic device but only in the emulator because the standard output (of the emulator) is used.

Good luck!

Martin Rosenau

Posted 2016-11-13T20:30:50.743

Reputation: 1 921

Answers

Verilog, optimized for area, 130 LE gates

The quine itself (the actual file is encoded in DEC SIXBIT):

module q(s,d);inout s,d;wire[0:1023]v=1024'b0110111100101001011001111110110110010110100100001111011010010111010101110101010110010110111100101001011001111110110100100110100100001111011010100111010101110110111000011100111100111010011001111011100000001001000111011010100111000101100111111010100111010111010100100110101101101110111010011111010110111000011011001101111000011110011100111000000010001100001011111100111001011001001001111001010000001100110010011010011001100010001010100111100101010010011000101001011001111010011011100000001010010111011010010010110100010110111010100111011010100001100100011111100000011010010111110101110110100100010110111001011011101001000000001001011011001100111001010000001010100111011010100010110100010110111001011011101001001011011011111001001101011011001001010000000000000000101101101111100100110101101100100101000000110001001000110011001100100100001001011011101001101110101111110101110100000000110011001100100100011011110111101001110010100101111011010000011010010001010000010010010011111101110110011101010001010000010010010100000111100010;reg[9:0]i=759;reg[2:0]j=7;assign d=j<6?j==2:v[i];always@(posedge s)if(j>5)begin i=i+1;j=j&1^!i?7:1;end else j=j+1;endmodule

Readable version with comments and a testbench:

module q(s,d);
   // Declare the ports. Making them both "inout" is shortest.
   inout s,d;
   // Data storage for the program.
   wire[0:1023]v=1024'b{DATA GOES HERE};
   // i is the current bit number within the program.
   // This is relative to the /end of the data storage/ (not to the start
   // of the program), so it starts at a nonzero value so that the output
   // starts at the start of the program.
   reg[9:0]i=759;
   // When expanding bits to (6-bit) bytes, j is the bit number within
   // the expansion, from 1 for the first bit up to 6 for the last.
   // When not expanding, j is always 7.
   // DEC SIXBIT encoding for 0 is (from MSB to LSB) 010 000.
   // DEC SIXBIT encoding for 1 is (from MSB to LSB) 010 001.
   // We use SSI encoding for the output, so the MSB is sent first.
   reg[2:0]j=7;
   assign d=j<6?j==2:v[i];
   // When we get a strobe:
   always@(posedge s)
     // If we just output a bit, move onto the next bit.
     // We may also need to reset j.
     if(j>5)
       begin 
          i=i+1;
          j=j&1^!i?7:1;
       end 
     else 
       // If we're inside a bit, continue to output that bit.
       j=j+1;
endmodule
// {TESTBENCH BELOW HERE}

`timescale 10ns / 1ns
module testbench();
   reg clock = 0;
   wire data, strobe;

   always
     #1 clock <= !clock;
   initial
     #14304 $finish;

   assign strobe = clock;
   q testquine(.s(strobe),.d(data));

   always @(negedge strobe)
      $display("%d", data);

endmodule // testbench

The use of Verilog gives me considerably more control over the low-level details than I'd have with Verity. In particular, it lets me control the clock and reset rules myself. This program's intended for a synchronous serial connection with strobe input s and data output d. Although each is only used in one direction, I declared them both as bidirectional to save a few bytes; I had to golf the non-data parts of the program down to 1024 bits to be able to use 10-bit logic gates internally (extra bits would be more expensive in area), and it only just scrapes under at 1008, so savings like this are important. In order to save a substantial amount of code, I rely on the FPGA's hardware reset circuitry rather than adding my own, and I merge the strobe and clock inputs (which is an old trick that's kind-of frowned upon nowadays because it makes it hard to keep the clock tree balanced at high clock speeds, but it's useful for golfing.) I hope that's synthesizable; I don't know how well Verilog synthesizers cope with using a bidirectional port as a clock.

The source is encoded in DEC SIXBIT (I'm assuming here that we interpret its single alphabet of letters as lowercase; a Verilog synthesizer would have no reason to use an uppercase interpretation). I used a six-bit character set internally in my other solution, then wasted bytes converting it; it's better to use a character set that's "naturally" six bits wide so that no conversion is necessary. I picked this particular six-bit character set because 0 and 1 differ only in their least significant bit, and only have one other bit set, meaning that the circuitry for converting a binary digit to DEC SIXBIT (i.e. "escaping" a string) can be very simple. Interestingly, the character set in question is missing a newline character; the original program's all on one line not just to make it easier to generate, but to make it possible to encode! It's a good thing that Verilog mostly doesn't care about whitespace.

The protocol for sending data to the host is based on Synchronous Serial Interface. I picked it because it's clocked (allowing me to use the clock/strobe trick, and also allowing me to write a portable program that doesn't rely on on-chip timing devices), and because it's very simple (thus I don't have to waste much code implementing it). This protocol doesn't specify a method of specifying where the message ends (the host is supposed to know); in this particular case, I padded the output up to a multiple of 1024 bits with zero bits (a total of 16 padding bits), after which (as required by SSI) the message restarts. (I don't implement an idle mode timer; its purpose is to determine whether to send a new message or whether to repeat the previous message, and as this program always sends its own source code as the message, the distinction isn't visible. You can consider it to be length 0, or infinitely long, depending on your point of view.)

In terms of the actual logic, the most interesting thing is the way that I split up the variables to reduce the amount of area needed on the chip. i, the larger register, holds the current "address" within the program's data, and is only ever changed via incrementing it; this means that its logic can be synthesized using the half-adder construction (which, as the name suggests, uses only half the resources that an adder does; this mostly only matters on the smallest FPGAs, larger ones will use 3-input or even 4-input LUTs which are powerful enough that they'll have lots of wasted capacity synthesizing a half-adder). The smaller register, j, is basically a state machine state and thus handles most of the program's complex logic. It's small enough that it can be handled entirely via lookup table on a larger FPGA (making the logic basically disappear); in case the program is synthesized for a small FPGA, I chose its encoding in such a way that few parts of the code care about all three of its bits at once.

It's also worth noting that I cyclically permuted the data storage. We can start i pointing anywhere inside it, not necessarily at the start. With the arrangement seen here, we can print from the initial value of i to the end directly, then print the entire array escaped, then print from the start to the initial value of i, in order to print the all the parts of the data in the right places without needing to save and restore the value of i. (This trick might be useful for quines in other languages too.)

The source is 1192 6-bit bytes long, the equivalent of 894 8-bit bytes. It's kind-of embarrassing that this contains fewer source bytes than my Verity submission, despite being optimized for something entirely different; this is mostly because Verilog has strings and Verity doesn't, meaning that even though I've encoded the program in binary rather than octal (which is substantially less efficient in terms of source code size), I can encode each byte of the program using six six-bit characters (one for each bit) rather than eight eight-bit characters (four for each octal digit). A Verilog submission that encoded the program in octal would probably be smaller in terms of source code size, but would almost certainly be larger in area.

I don't know how much area this program will end up using; it depends a lot on how powerful the optimizer is in your Verilog synthesizer (because the minimization problem of converting the stored data into a set of logic gates is something that's done in the synthesizer itself; throwing the work onto the synthesizer makes the source code itself much shorter, and thus reduces the area needed to store it). It should have a complexity of O(n log n), though, and thus be much smaller than the O(n²) of the other program. I'd be interested to see the OP try to run it on their FPGA. (It may take quite some time to synthesize, though; there are various steps you can take to optimize a program for compile time but I didn't take any here as it'd cause a larger program = larger area.)

user62131

Posted 2016-11-13T20:30:50.743

Reputation:

1The first compiler run says that only 130 LE gates are used. Because an LE gate can only store 2 bits of data and you use a ~750 bit data stream this may mean that the compiler compressed the data or that something went wrong while compiling. Tomorrow in the morning I'll verify if the compiler result is correct. – Martin Rosenau – 2016-11-17T04:24:26.457

With this construction, much of the data is stored in the pattern of connections between the gates rather than the gates themselves, so I can believe that 130 LE gates is correct. (The synthesizer is basically brute-forcing a formula that maps a 10-bit index to a 1-bit value; I specified the formula in the Verilog program using a lookup table with 1024 entries, but the synthesizer is very likely to use a more efficient representation based on something like a K-map minimization.) – None – 2016-11-17T04:55:12.970

Oh, you should probably also check to make sure that the compiler hasn't optimized part of the code into a block RAM or block ROM (disallowed by the question). I didn't request one, and haven't written the code in a form that would imply one (I was careful to make the lookup table combinatorial), but sometimes compiler optimizations do weird things. If there is an optimization interfering with that, you'd have to turn the optimization off. – None – 2016-11-17T06:34:23.127

I tried to get that running on my FPGA for half a day (OK. 3 hours wasted because I searched for some CD-ROM I obviously lost): The data stream returned by the FPGA seems not to be correct. Unfortunately I havn't found out why... – Martin Rosenau – 2016-11-17T12:48:42.633

1Ok. I managed the problems with the pins now. The code seems to work well. Outstanding. Only 130 LE-type cells! RAM, ROM etc. is not used. I think the compiler does some optimization similar to KV diagrams to "compress" the data. – Martin Rosenau – 2016-11-17T13:49:16.837

1You win! Congratulations. – Martin Rosenau – 2016-11-22T09:49:13.533

Verity 0.10, optimized for source code size (1944 bytes)

I originally misread the question and interpreted it as a code-golf. That was probably for the best, as it's much easier to write a quine with short source code than short object code under the restrictions in the question; that made the question easy enough that I felt I could reasonably produce an answer, and might work as a stepping stone on the way to a better answer. It also prompted me to use a higher-level language for the input, meaning that I'd need to express less in the program itself. I didn't create Verity as a golfing language for hardware (I was actually hired to create it a while ago in an entirely different context), but there's quite a reminiscence there (e.g. it's substantially higher level than a typical HDL is, and it has much less boilerplate; it's also much more portable than the typical HDL).

I'm pretty sure that the correct solution for short object code involves storing the data in some kind of tree structure, given that the question disallows the use of block ROM, which is where you'd normally store it in a practical program; I might have a go at writing a program that uses this principle (not sure what language, maybe Verity, maybe Verilog; VHDL has too much boilerplate to likely be optimal for this sort of problem) at some point. That would mean that you wouldn't need to pass every bit of the source code to every bit of your "manually created ROM". However, the Verity compiler currently synthesizes the structure of the output based on the precedence and associativity of the input, meaning that it's effectively representing the instruction pointer (thus the index to the lookup table) in unary, and a unary index multiplied by the length of the lookup table gives this O(n²) space performance.

The program itself:

import <print>new x:=0$1296in(\p.\z.\a.new y:=(-a 5-a 1-a 1-a 2-a 4-a 2-a 3-a 2-a 6-a 2-a 0-a 3-a 0-a 4-a 4-a 7-a 4-a 2-a 6-a 2-a 5-a 1-a 2-a 2-a 0-a 3-a 6-a 7-a 2-a 2-a 1-a 1-a 3-a 3-a 0-a 4-a 4-a 3-a 2-a 7-a 5-a 7-a 0-a 6-a 4-a 4-a 1-a 6-a 2-a 6-a 1-a 7-a 6-a 6-a 5-a 1-a 2-a 2-a 0-a 5-a 0-a 0-a 4-a 2-a 6-a 5-a 0-a 0-a 6-a 3-a 6-a 5-a 0-a 0-a 5-a 0-a 6-a 5-a 2-a 2-a 1-a 1-a 3-a 3-a 0-a 4-a 5-a 3-a 2-a 7-a 5-a 7-a 0-a 5-a 5-a 5-a 1-a 4-a 4-a 3-a 1-a 5-a 5-a 1-a 2-a 2-a 0-a 4-a 3-a 3-a 4-a 1-a 5-a 1-a 0-a 2-a 1-a 1-a 1-a 4-a 4-a 3-a 6-a 7-a 0-a 6-a 0-a 1-a 3-a 2-a 0-a 5-a 4-a 2-a 0-a 5-a 5-a 1-a 2-a 1-a 0-a 4-a 6-a 3-a 4-a 7-a 3-a 6-a 2-a 6-a 0-a 3-a 4-a 1-a 1-a 1-a 2-a 2-a 0-a 4-a 6-a 3-a 3-a 5-a 1-a 7-a 2-a 6-a 1-a 1-a 0-a 2-a 7-a 2-a 1-a 1-a 0-a 4-a 6-a 3-a 1-a 5-a 3-a 7-a 5-a 1-a 2-a 1-a 0-a 4-a 6-a 3-a 5-a 7-a 5-a 7-a 4-a 6-a 5-a 6-a 0-a 3-a 4-a 1-a 1-a 1-a 2-a 2-a 0-a 4-a 3-a 3-a 4-a 1-a 5-a 1-a 0-a 2-a 1-a 1-a 1-a 4-a 5-a 3-a 6-a 7-a 0-a 6-a 0-a 1-a 3-a 2-a 0-a 5-a 4-a 2-a 0-a 4-a 1-a 7-a 7-a 6-a 3-a 7-a 4-a 2-a 0-a 4-a 3-a 6-a 2-a 6-a 3-a 7-a 4-a 2-a 0-a 5-a 4-a 6-a 0-a 7-a 2-a 0-a 1-a 4-a 5-a 3-a 4-a 4-a 4-a 4-a 3-a 6-a 4-a 4-a 4-a 4-a 3-a 6-a 2-a 6-a 1-a 5-a 3-a 7-a 4-a 2-a 0-a 4-a 4-a 6-a 5-a 6-a 3-a 7-a 5-a 3-a 2-a 7-a 5-a 7-a 1-a 4-a 5-a 3-a 6-a 7-a 6-a 7-a 3-a 6-a 1-a 5-a 1-a 1-a 0-a 2-a 7-a 2-a 1-a 1-a 0-a 4-a 7-a 2-a 7-a 1-a 5-a 1-a 4-a 2-a 3-a 7-a 4-a 3-a 2-a 7-a 5-a 7-a 1-a 4-a 4-a 3-a 6-a 7-a 6-a 7-a 6-a 6-a 1-a 5-a 1-a 5-a 4-a 2-a 6-a 2-a 5-a 1-a 2-a 2-a 0-a 3-a 0-a 5-a 1-a 4-a 4-a 3-a 4-a 4-a 4-a 4-a 6-a 6-a 4-a 4-a 4-a 4-a 3-a 6-a 2-a 6-a 1-a 5-a 0-a 5-a 0-a 0-a 0-a 1-a 6-a 5-a 4-a 3-a 2-a 7-a 5-a 7-a 1-a 4-a 4-a 3-a 6-a 7-a 6-a 7-a 3-a 6-a 2-a 0-a 0-a 1-a 4-a 7-a 4-a 7-a 1-a 6-a 2-a 6-a 1-a 7-a 3-a 6-a 3-a 7-a 0-a 6-a 1-a 5-!x)in while!x>0do(p(if z<32then z+92else z);if z==45then while!y>0do(p 97;p 32;p(48^!y$$3$$32);p 45;y:=!y>>3)else skip;x:=!x>>6))print(!x$$6$$32)(\d.x:=!x>>3^d<<1293;0)

More readable:

import <print>
new x := 0$1296 in
(\p.\z.\a.
  new y := (-a 5-a 1-
            # a ton of calls to a() omitted...
            -a 1-a 5-!x) in
  while !x>0 do (
    p(if z<32 then z+92 else z);
    if z==45
    then while !y>0 do (
      p 97;
      p 32;
      p(48^!y$$3$$32);
      p 45;
      y:=!y>>3 )
    else skip;
    x:=!x>>6
  )
)(print)(!x$$6$$32)(\d.x:=!x>>3^d<<1293;0)

The basic idea is that we store the entire data in the variable x. (As usual for a quine, we have a code section and a data section; the data encodes the text of the code, and can also be used to regenerate the text of the data.) Unfortunately, Verity doesn't currently allow very large constants to be written in the source code (it uses OCaml integers during compilation to represent integers in the source, which clearly isn't correct in a language that supports arbitrarily wide integer types) – and besides, it doesn't allow constants to be specified in octal – so we generate the value of x at runtime via repeated calls to a function a. We could create a void function and call it repeatedly as separate statements, but that would make it hard to identify where to start outputting the text of the data section. So instead, I made a return an integer, and use arithmetic to store the data (Verity guarantees that arithmetic evaluates left to right). The data section is encoded in x using a single - sign; when this is encountered at run time, it's expanded to the full -a 5-a 1-, etc., via the use of y.

Initializing y as a copy of x is fairly subtle here. Because a returns zero specifically, most of the sum is just zero minus zero minus… and cancels itself out. We end with !x (i.e. "the value of x"; in Verity, as in OCaml, a variable's name works more like a pointer than anything else, and you have to dereference it explicitly to get at the variable's value). Verity's rules for unary minus are a little complex – the unary minus of v is written as (-v) – thus (-0-0-0-!x) parses as (-(0-0-0-!x)), which is equal to !x, and we end up initializing y as a copy of x. (It's also worth noting that Verity is not call-by-value, but rather allows functions and operators to choose the order they evaluate things; - will evaluate the left argument before the right argument, and in particular, if the left argument has side effects, those will be visible when the right argument is evaluated.)

Each character of the source code is represented using two octal digits. This means that the source code is limited to 64 different characters, so I had to create my own codepage for internal use. The output is in ASCII, so I needed to convert internally; this is what the (if z<32 then z+92 else z) is for. Here's the character set I used in the internal representation, in numerical order (i.e. \ has codepoint 0, ? has codepoint 63):

\]^_`abcdefghijklmnopqrstuvwxyz{ !"#$%&'()*+,-./0123456789:;<=>?

This character set gives us most of the characters important for Verity. Notable characters missing are } (meaning that we can't create a block using {}, but luckily all statements are expressions so we can use () instead); and | (this is why I had to use an exclusive rather than inclusive OR when creating the value of x, meaning I need to initialize it to 0; however, I needed to specify how large it was anyway). Some of the critical characters that I wanted to ensure were in the character set were <> (for imports, also shifts), () (very hard to write a program that can be parsed without these), $ (for everything to do with bitwidth), and \ (for lambdas; theoretically we could work around this with let…in but it would be much more verbose).

In order to make the program a bit shorter, I constructed abbreviations for print and for !x$$6$$32 (i.e. "the bottom 6 bits of !x, cast to be usable to the print library) via temporarily binding them to lambda arguments.

Finally, there's the issue of output. Verity provides a print library that's intended for debug output. On a simulator, it prints the ASCII codes to standard output, which is perfectly usable for testing the program. On a physical circuit board, it depends on a print library having been written for the particular chip and board surrounding it; there's a print library in the Verity distribution for an evaluation board I had access to that prints the output on seven-segment displays. Given that the library will end up taking space on the resulting circuit board, it may be worth using a different language for an optimized solution to this problem so that we can output the bits of the output directly on wires.

By the way, this program is O(n²) on hardware, meaning that it's much worse on a simulator (I suspect O(n⁴); not sure, though, but it was hard enough to simulate that it seems unlikely to be even cubic, and based on how the time reacted to my changes as I was writing the program, the function seems to grow very quickly indeed). The Verity compiler needed 436 optimization passes (which is much, much more than it'd typically use) to optimize the program, and even after that, simulating it was very hard for my laptop. The complete compile-and-simulate run took the following time:

real  112m6.096s
user  105m25.136s
sys   0m14.080s

and peaked at 2740232 kibibytes of memory. The program takes a total of 213646 clock cycles to run. It does work, though!

Anyway, this answer doesn't really fulfil the question as I was optimizing for the wrong thing, but seeing as there are no other answers yet, this is the best by default (and it's nice to see what a golfed quine would look like in a hardware language). I'm not currently sure whether or not I'll work on a program that aims to produce more optimized ouptut on the chip. (It would likely be a lot larger in terms of source, as an O(n) data encoding would be rather more complex than the one seen here.)

user62131

Posted 2016-11-13T20:30:50.743

Reputation:

What's the score for the primary criterion (LE gates used on the specified FPGA)? – Mego – 2016-11-16T19:13:47.837

No, I mean, what score does this solution achieve? – Mego – 2016-11-16T19:20:25.727

@Mego I'm just trying to get the compiler from veritygos.org to check... – Martin Rosenau – 2016-11-16T19:22:10.930

@Mego: No idea; Verity itself is a portable language, but the Verity implementation probably doesn't have a toplevel already implemented for the specific chip specified, and anyway I don't have an FPGA synthesizer to hand at the moment. My simulator says there are 6328084 driven signals; that should have an approximately linear relationship to the number of LE gates needed, but I don't know what the constant factor is. (In general, having a criterion specified in terms of something that could be objectively checked on a simulator would make things easier here.) – None – 2016-11-16T19:23:22.563

@MartinRosenau: You might want to try seeing if the de2 or de3 backends work (-m de2, -m de3). Those were both written for Altera chips; they might or might not be the same as yours, and they might or might not happen to work anyway even if they aren't. (If you don't happen to be using the same evaluation boards I was, you can probably compile as far as the VHDL but will have to do the rest of the compilation by hand.) – None – 2016-11-16T19:26:17.550

1The synthesis runs out of memory - which does not neccessarily mean that it wouldn't work. However the intermediate VHDL file generated by the Verity compiler has a size of > 1MB while your Verilog solution has only 2KB in size so I doubt the Verity solution requires less logic cells than the Verity solution. – Martin Rosenau – 2016-11-17T04:15:31.373

The logical quine

Answers

Verilog, optimized for area, 130 LE gates

Verity 0.10, optimized for source code size (1944 bytes)