Your task, if you wish to accept it, is to write a program that outputs its own source code in the binary UTF-8 representation.

Rules

The source must be at least 1 byte long.
Your program must not take input (or have an unused, empty input).
The output may be in any convient format.
Optional trailing newline is allowed.
Notice that one byte is 8 bits, and the length of the binary UTF-8 representation is necessarily a multiple of 8.
This is code-golf so all usual golfing rules apply, and the shortest code (in bytes) wins.
Standard loopholes are forbidden.

Example

Let's say your source code is Aä$$€h, its corresponding UTF-8 binary representation is 010000011100001110100100001001000010010011100010100000101010110001101000.

If I run Aä$$€h the output must be 010000011100001110100100001001000010010011100010100000101010110001101000.

A      --> 01000001
ä      --> 1100001110100100
$      --> 00100100
$      --> 00100100
€      --> 111000101000001010101100
h      --> 01101000
Aä$$€h --> 010000011100001110100100001001000010010011100010100000101010110001101000

String to binary UTF-8 converters

mdahmoune

Posted 2019-02-04T18:12:22.303

Reputation: 2 605

1By "binary", do you mean a string representation of the binary values, i.e. a string consisting of only 1's and 0's? – None – 2019-02-04T18:38:00.987

1@mdahmoune Now that's already much better. The question remains how to represent something as UTF-8. Notice that Unicode representation is mainly based on the looks of a character (only occasionally on semantic meaning). What if no assigned Unicode glyph looks like a character in the source code? Unicode also has many look-alikes (homoglyphs). How does one decide which one to use? E.g. Dyalog APL has an AND function which may be encoded as 01011110 or 0010011100100010 in UTF-8 (they look pretty alike: ^ vs ∧) – Adám – 2019-02-04T19:44:15.290

1Better example: 01111100 and 0010001100100010 encode | and ∣. – Adám – 2019-02-04T19:51:13.917

4@Adám I think it would be fair to output any binary sequence that corresponds to a symbol that will compile/run in a certain implementation of a language. – qwr – 2019-02-04T20:09:22.837

1How about machine code? (Commodore C64 takes 28 bytes assuming the machine code itself is the "source") – Martin Rosenau – 2019-02-04T21:18:22.377

Does it have to be a program, not a function that takes a pointer to an output buffer? Like @MartinRosenau, I'm wondering about machine code. If a whole program is required, I think we could probably still follow the usual code-golf rules of only counting the actual executable bytes as "the source", not any executable file metadata. (i.e. the contents of a .text section for an x86-64 Linux executable that base2-dumps itself to stdout, perhaps using RIP-relative addressing got get its own code bytes. Or not, because if we have to be an executable, we can be position-dependent and shorter.) – Peter Cordes – 2019-02-05T02:38:50.723

Does the "binary" dump of UTF-8 have to be in any particular character set or encoding? Can we choose to dump it in a format that packs 8 binary bits per octet (i.e. actual binary UTF-8, a serialization format for Unicode codepoints)? Or do you require a text representation of base 2 digits, using the ASCII subset of UTF-8? Or any choice of pre-existing encoding used for text, like EBCDIC or UTF-16? Basically it annoys me when people use "binary" to mean a serialization format with 1 bit per character, similar to hex. UTF-8 itself is a binary format composed of 0s and 1s. – Peter Cordes – 2019-02-05T02:52:50.097

@PeterCordes If you can dump it in a format that packs 8 binary bits per octet then any standard quine would count as a binary quine. – user253751 – 2019-02-05T03:02:32.413

@immibis: that's exactly my point. In computing, the word "binary" is not sufficient to describe what this question is trying to ask for. It's fairly clear from context what's intended, but I think I could rules-lawyer my way into a standard quine in a UTF-8 source (i.e. not x86 machine code, because that's not always valid UTF-8) with phrases like "the binary UTF-8 representation", because UTF-8 is a binary format for serializing Unicode codepoints. – Peter Cordes – 2019-02-05T03:04:26.670

Oh, that's probably what @MartinRosenau was getting at. What about languages that aren't textual in the first place, like machine code? Many instruction sequences don't form valid UTF-8 sequences, where the upper bits signal how many later bytes are part of the same character. Can we just have a function base-2 dump itself in ASCII/UTF-8? – Peter Cordes – 2019-02-05T03:10:56.530

Answers

V, 28 (or 16?) Latin 1 bytes (35 UTF-8 bytes)

ñéÑ~"qpx!!xxd -b
ÎdW54|D
Íßó

Try it online!

Hexdump (in Latin 1):

00000000: f1e9 d17e 2271 7078 2121 7878 6420 2d62  ...~"qpx!!xxd -b
00000010: 0ace 6457 3534 7c44 0acd dff3            ..dW54|D....

Output (binary representation of the same code in UTF-8, not Latin 1):

110000111011000111000011101010011100001110010001011111100010001001110001011100000111100000100001001000010111100001111000011001000010000000101101011000100000110111000011100011100110010001010111001101010011010001111100010001000000110111000011100011011100001110011111110000111011001100001010

Explanation:

ñéÑ~"qpx            " Standard quine. Anything after this doesn't affect the
                    " program's 'quine-ness' unless it modifies text in the buffer
        !!xxd -b    " Run xxd in binary mode on the text
Î                   " On every line...
 dW                 "   delete a WORD
   54|              "   Go to the 54'th character on this line
      D             "   And delete everything after the cursor
Í                   " Remove on every line...
  ó                 "   Any whitespace
 ß                  "   Including newlines

Or...

V, 16 bytes

ñéÑ~"qpx!!xxd -b

Try it online!

Output:

00000000: 11000011 10110001 11000011 10101001 11000011 10010001  ......
00000006: 01111110 00100010 01110001 01110000 01111000 00100001  ~"qpx!
0000000c: 00100001 01111000 01111000 01100100 00100000 00101101  !xxd -
00000012: 01100010 00001010                                      b.

OP said:

The output may be in any convenient format.

This outputs in a much more convenient format for V :P (but I'm not sure if that's stretching the rules)

James

Posted 2019-02-04T18:12:22.303

Reputation: 54 537

CJam, 20 bytes

{s"_~"+{i2b8Te[}%}_~

Try it online!

Surprised to see CJam winning! _{we'll see how long that lasts...}

Esolanging Fruit

Posted 2019-02-04T18:12:22.303

Reputation: 13 542

05AB1E, 105 bytes

0"D34çýÇbεDg•Xó•18в@ƶà©i7j0ìëR6ôRíć7®-jšTìJ1®<×ì]ð0:J"D34çýÇbεDg•Xó•18в@ƶà©i7j0ìëR6ôRíć7®-jšTìJ1®<×ì]ð0:J

05AB1E has no UTF-8 conversion builtins, so I have to do everything manually..

Try it online or verify that it's a quine.

Explanation:

quine-part:

The shortest quine for 05AB1E is this one: 0"D34çý"D34çý (14 bytes) provided by @OliverNi. My answer uses a modified version of that quine by adding at the ... here: 0"D34çý..."D34çý.... A short explanation of this quine:

0               # Push a 0 to the stack (can be any digit)
 "D34çý"        # Push the string "D34çý" to the stack
        D       # Duplicate this string
         34ç    # Push 34 converted to an ASCII character to the stack: '"'
            ý   # Join everything on the stack (the 0 and both strings) by '"'
                # (output the result implicitly)

Challenge part:

Now for the challenge part of the code. As I mentioned at the top, 05AB1E has no UTF-8 conversion builtins, so I have to do these things manually. I've used this source as reference on how to do that: Manually converting unicode codepoints into UTF-8 and UTF-16. Here a short summary of that regarding the conversion of Unicode characters to UTF-8:

Convert the unicode characters to their unicode values (i.e. "dЖ丽" becomes [100,1046,20029])
Convert these unicode values to binary (i.e. [100,1046,20029] becomes ["1100100","10000010110","100111000111101"])
Check in which of the following ranges the characters are:
1. 0x00000000 - 0x0000007F (0-127): 0xxxxxxx
2. 0x00000080 - 0x000007FF (128-2047): 110xxxxx 10xxxxxx
3. 0x00000800 - 0x0000FFFF (2048-65535): 1110xxxx 10xxxxxx 10xxxxxx
4. 0x00010000 - 0x001FFFFF (65536-2097151): 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

There are also ranges for 5 or 6 bytes, but let's leave them out for now.

The character d will be in the first range, so 1 byte in UTF-8; character Ж is in the second range, so 2 bytes in UTF-8; and character 丽 is in the third range, so 3 bytes in UTF-8.

The x in the pattern behind it are filled with the binary of these characters, from right to left. So the d (1100100) with pattern 0xxxxxxx becomes 01100100; the Ж (10000010110) with pattern 110xxxxx 10xxxxxx becomes 11010000 10010110; and the 丽 (100111000111101) with pattern 1110xxxx 10xxxxxx 10xxxxxx becomes 1110x100 10111000 10111101, after which the remaining x are replaced with 0: 11100100 10111000 10111101.

So, that approach I also used in my code. Instead of checking the actual ranges, I just look at the length of the binary and compare it to the amount of x in the patterns however, since that saves a few bytes.

Ç               # Convert each character in the string to its unicode value
 b              # Convert each value to binary
  ε             # Map over these binary strings:
   Dg           #  Duplicate the string, and get its length
     •Xó•       #  Push compressed integer 8657
         18в    #  Converted to Base-18 as list: [1,8,12,17]
            @   #  Check for each if the length is >= to this value
                #  (1 if truthy; 0 if falsey)
   ƶ            #  Multiply each by their 1-based index
    à           #  Pop and get its maximum
     ©          #  Store it in the register (without popping)
   i            #  If it is exactly 1 (first range):
    7j          #   Add leading spaces to the binary to make it of length 7
      0ì        #   And prepend a "0"
   ë            #  Else (any of the other ranges):
    R           #   Reverse the binary
     6ô         #   Split it into parts of size 6
       Rí       #   Reverse it (and each individual part) back
    ć           #   Pop, and push the remainder and the head separated to the stack
     7®-        #   Calculate 7 minus the value from the register
        j       #   Add leading spaces to the head binary to make it of that length
         š      #   Add it at the start of the remainder-list again
    Tì          #   Prepend "10" before each part
      J         #   Join the list together
    1®<×        #   Repeat "1" the value from the register - 1 amount of times
        ì       #   Prepend that at the front
  ]             # Close both the if-else statement and map
   ð0:          # Replace all spaces with "0"
      J         # And join all modified binary strings together
                # (which is output implicitly - with trailing newline)

See this 05AB1E answer of mine (sections How to compress large integers? and How to compress integer lists?) to understand why •Xó•18в is [1,8,12,17].

Kevin Cruijssen

Posted 2019-02-04T18:12:22.303

Reputation: 67 575

JavaScript (Node.js), 60 bytes

-15 bytes from @Neil and @Shaggy

f=_=>[...Buffer(`f=`+f)].map(x=>x.toString(2).padStart(8,0))

Try it online!

Luis felipe De jesus Munoz

Posted 2019-02-04T18:12:22.303

Reputation: 9 639

padStart(8,0) saves 2 bytes. – Neil – 2019-02-04T21:57:13.297

The spec allows for output to be in any convenient format so you could keep the map and ditch the join to output an array of bits – Shaggy – 2019-02-04T23:35:50.680

60 bytes with output as an array of bytes. – Shaggy – 2019-02-04T23:46:04.980

Thanks @Neil and @Shaggy!! – Luis felipe De jesus Munoz – 2019-02-05T02:35:22.400

Java 10, 339 308 265 227 225 186 184 bytes

v->{var s="v->{var s=%c%s%1$c;return 0+new java.math.BigInteger(s.format(s,34,s).getBytes()).toString(2);}";return 0+new java.math.BigInteger(s.format(s,34,s).getBytes()).toString(2);}

-8 bytes thanks to @NahuelFouilleul removing the unnecessary &255 (and an additional -35 for bringing to my attention that the full program specs of the challenge had been revoked and a function is allowed now as well..)
-41 bytes thanks to @OlivierGrégoire.

Try it online.

Explanation:

quine-part:

var s contains the unformatted source code String
%s is used to put this String into itself with s.format(...)
%c, %1$c and 34 are used to format the double-quotes (")
s.format(s,34,s) puts it all together

Challenge part:

v->{                         //  Method with empty unused parameter and String return-type
  var s="...";               //   Unformatted source code String
  return 0+                  //   Return, with a leading "0":
   new java.math.BigInteger( //    A BigInteger of:
     s.format(s,34,s)        //     The actual source code String
      .getBytes())           //     Converted to a list of bytes (UTF-8 by default)
   .toString(2);}            //    And convert this BigInteger to a binary-String

Kevin Cruijssen

Posted 2019-02-04T18:12:22.303

Reputation: 67 575

1265 bytes using lambda, also because all source is ascii seems unsigned int c&255 is not needed – Nahuel Fouilleul – 2019-02-05T13:32:08.073

@NahuelFouilleul The original question stated "You must build a full program." and "Your output has to be printed to STDOUT.", hence the verbose border-plate code I have instead of a lambda function returning a String. Good point about not needing &255 however since we don't use any non-ASCII characters, thanks! – Kevin Cruijssen – 2019-02-05T13:34:52.747

ok i'm not yet very familar with the usages, but other languages like javascript give a lambda returning a string, also i don't understand why in java we don't count the type and the final semicolon when using lambda where could i find rules? – Nahuel Fouilleul – 2019-02-05T13:44:35.630

@OlivierGrégoire error: incompatible types: byte cannot be converted to Long

– Kevin Cruijssen – 2019-02-05T13:48:12.707

@NahuelFouilleul Relevant meta posts: Untyped functions in static languages; Unnamed functions in code golf; What are our rules about additional code accompanying function submissions?; Are we allowed to use empty input we won't use when no input is asked?; Byte count for named, recursive lambda expression in C#.

– Kevin Cruijssen – 2019-02-05T13:57:38.180

s.format(s,34,s).chars().forEach(c->System.out.printf("%08d",new Long(Long.toString(c,2)))); is one byte shorter than for(int c:s.format(s,34,s).getBytes())System.out.printf("%08d",new Long(Long.toString(c,2)));, so I guess the quine should be shorter as well, but I can't get how to make it work (hence the previous failure as well). – Olivier Grégoire – 2019-02-05T14:18:59.300

@OlivierGrégoire Thanks! And what was the part that you couldn't work out? Personally I remove the content of the String when making changes. Then ctrl+A and paste it in. Then change the inner var s="" to var s=%c%s%1$c, and every other " to %1$c and % to %%. – Kevin Cruijssen – 2019-02-05T14:25:05.357

Well, that's where I'm lost. However I tried and here's a new candidate for 184 bytes. Tell me if I'm wrong somewhere ;)

– Olivier Grégoire – 2019-02-05T14:37:34.407

1@OlivierGrégoire Ah, nice approach! Completely forgot about BigInteger being pretty short for converting to binary-Strings. And 2 more bytes by changing the return'0'+ to return 0+. Hmm, why is that leading 0 necessary btw? It confuses me that all inner binary-Strings have this leading 0, but the very first one not when using BigInteger.toString(2).. – Kevin Cruijssen – 2019-02-05T14:44:39.857

You're fast! The '0' stayed for only 20 seconds before being changed to <space>0 in my comment! :-) – Olivier Grégoire – 2019-02-05T14:47:39.423

The leading 0 is there because it's required: "Notice that one byte is 8 bits, and the length of the binary UTF-8 representation is necessarily a multiple of 8." – Olivier Grégoire – 2019-02-05T14:50:40.837

@OlivierGrégoire I know it's required for the challenge. :) I meant that I don't understand why all the other binary substrings are 8 bits with leading 0s, but the first one is not. Although I guess BigInteger.toString(2) convert its bytes to binary-Strings of size 8, joins them all together, and convert it back to a BigInteger (implicitly removing the leading 0), before converting to a String? – Kevin Cruijssen – 2019-02-05T14:55:13.170

Oh, that? It's because BigInteger takes all the bytes as one mega number where bits happen to be spread over several bytes. Only the first zero(es) are not significant: the other ones are significant, so they are shown. It's not as if BigInteger considered each byte to be a number by itself. – Olivier Grégoire – 2019-02-05T14:59:02.293

Rust, 187 bytes

fn f(o:u8){for c in b"go!g)n;t9(zgns!b!ho!c#%#/huds)(zhg!b_n <27zqshou )#z;19c|#-b_n(:|dmrdzg)1(:|||go!l`ho)(zg)0(:|".iter(){if c^o!=36{print!("{:08b}",c^o);}else{f(0);}}}fn main(){f(1);}

Try it online!

NieDzejkob

Posted 2019-02-04T18:12:22.303

Reputation: 4 630

Perl 6, 46 bytes

<say "<$_>~~.EVAL".ords.fmt("%08b",'')>~~.EVAL

Try it online!

The standard quine with .fmt("%08b",'') formats the list of ordinal values into length 8 binary and joins with an empty string.

Jo King

Posted 2019-02-04T18:12:22.303

Reputation: 38 234

Perl 5, 42 bytes

$_=q(say unpack'B*',"\$_=q($_);eval");eval

TIO

Nahuel Fouilleul

Posted 2019-02-04T18:12:22.303

Reputation: 5 582

Python 2, 68 67 bytes

_="print''.join(bin(256|ord(i))[3:]for i in'_=%r;exec _'%_)";exec _

Try it online!

A modification of this answer

-1 bytes by removing the space after 'in' (thanks @mdahmoune)

MilkyWay90

Posted 2019-02-04T18:12:22.303

Reputation: 2 264

-1 byte: u can drop the space after in – mdahmoune – 2019-02-05T18:24:33.753

you haven't updated your TIO link. also, I tried to do '%08b'%ord(i) instead of bin(256|ord(i))[3:], but it didn't work for some reason – Jo King – 2019-02-06T09:50:46.263

R, 138 114 bytes

x=function(){rev(rawToBits(rev(charToRaw(sprintf("x=%s;x()",gsub("\\s","",paste(deparse(x),collapse="")))))))};x()

Try it online!

Uses R’s ability to deparse functions to their character representation. The revs are needed because rawToBits puts the least significant bit first. as.integer is needed because otherwise the bits are displayed with a leading zero.

Edited once I realised that any convenient output was allowed. Also was out by one on original byte count.

Nick Kennedy

Posted 2019-02-04T18:12:22.303

Reputation: 11 829

C# (Visual C# Interactive Compiler), 221 bytes

var s="var s={0}{1}{0};Write(string.Concat(string.Format(s,(char)34,s).Select(z=>Convert.ToString(z,2).PadLeft(8,'0'))));";Write(string.Concat(string.Format(s,(char)34,s).Select(z=>Convert.ToString(z,2).PadLeft(8,'0'))));

Try it online!

C# (Visual C# Interactive Compiler) with flag `/u:System.String`, 193 bytes

var s="var s={0}{1}{0};Write(Concat(Format(s,(char)34,s).Select(z=>Convert.ToString(z,2).PadLeft(8,'0'))));";Write(Concat(Format(s,(char)34,s).Select(z=>Convert.ToString(z,2).PadLeft(8,'0'))));

Try it online!

Embodiment of Ignorance

Posted 2019-02-04T18:12:22.303

Reputation: 7 014

Bash + GNU tools, 48 bytes

trap -- 'trap|xxd -b|cut -b9-64|tr -dc 01' EXIT

TIO

Nahuel Fouilleul

Posted 2019-02-04T18:12:22.303

Reputation: 5 582

thanks, updated indeed it's the shortest variation otherwise should be removed from trap output – Nahuel Fouilleul – 2019-02-05T11:21:39.433

Quine outputs itself in binary

Rules

Example

String to binary UTF-8 converters

Answers

V, 28 (or 16?) Latin 1 bytes (35 UTF-8 bytes)

V, 16 bytes

CJam, 20 bytes

05AB1E, 105 bytes

JavaScript (Node.js), 60 bytes

Java 10, 339 308 265 227 225 186 184 bytes

Rust, 187 bytes

Perl 6, 46 bytes

Perl 5, 42 bytes

Python 2, 68 67 bytes

R, 138 114 bytes

C# (Visual C# Interactive Compiler), 221 bytes

C# (Visual C# Interactive Compiler) with flag /u:System.String, 193 bytes

Bash + GNU tools, 48 bytes

C# (Visual C# Interactive Compiler) with flag `/u:System.String`, 193 bytes