Introspective Programming: Code that analyzes its source and its output

13

Write a program that outputs the total number of characters and the frequency of each character in its source and its output. You must follow the format illustrated in the example.

Example

If your code was

abb1

Its output would have to be

My source has 4 characters.
1 is "a"
2 are "b"
1 is "1"
Besides unquoted numbers, my output has 383 characters.
34 are "
"
79 are " "
63 are """
2 are "'"
2 are ","
4 are "."
2 are "1"
2 are "B"
2 are "I"
2 are "M"
39 are "a"
4 are "b"
6 are "c"
4 are "d"
38 are "e"
3 are "g"
5 are "h"
4 are "i"
4 are "m"
3 are "n"
8 are "o"
3 are "p"
2 are "q"
38 are "r"
12 are "s"
8 are "t"
7 are "u"
3 are "y"
It's good to be a program.

(Output must go to stdout.)

Notice, for example, that the output contains two capitalized m's. One for My and one for 2 are "M". This must hold true for all characters so the output does not contradict itself in any way.

Unquoted numbers are ignored in the output to avoid unsatisfiable frequency sets. For example, 1 is "1" is incorrect if both 1's are counted. It should read 2 are "1", but then there is only one 1 again.

Format Clarifications

  • "is" must be used for single character occurrences.

  • "are" must be used for multiple character occurrences.

  • "is" should never appear in the list of output characters because it would be superfluous. 1 is 'Z' refers to the Z in itself, so the entire line can be removed.

  • The three full-sentence phrases must appear in order with the character frequency lists in between (as the example shows). So your output will start with My source... and end with ...be a program.. Note that there is no newline at the end of the output.

  • The character frequency lists themselves may be in any order.

  • Newlines count as one character (in case they are \r\n).

Format Checker

The following Python script takes your code and its output as strings and asserts that the output has no contradictions. It provides a useful error message if something is wrong. You can run it online at http://ideone.com/6H0ldu by forking it, replacing the CODE and OUTPUT strings, then running it. It will never give false positives or negatives (assuming its error free).

#Change the CODE and OUTPUT strings to test your program

CODE = r'''abb1'''

OUTPUT = r'''My source has 4 characters.
1 is "a"
2 are "b"
1 is "1"
Besides unquoted numbers, my output has 383 characters.
34 are "
"
79 are " "
63 are """
2 are "'"
2 are ","
4 are "."
2 are "1"
2 are "B"
2 are "I"
2 are "M"
39 are "a"
4 are "b"
6 are "c"
4 are "d"
38 are "e"
3 are "g"
5 are "h"
4 are "i"
4 are "m"
3 are "n"
8 are "o"
3 are "p"
2 are "q"
38 are "r"
12 are "s"
8 are "t"
7 are "u"
3 are "y"
It's good to be a program.'''

#######################################################

import re

amountPattern = r'(\d+) (is|are) "(.)"\n'

class IntrospectionException(Exception):
    pass

def getClaimedAmounts(string, errorOnIs):
    groups = re.findall(amountPattern, string, re.DOTALL)

    for amount, verb, char in groups:
        if verb == 'is':
            if errorOnIs:
                raise IntrospectionException('\'1 is "%s"\' is unnecessary' % char)
            elif amount != '1':
                raise IntrospectionException('At "%s", %s must use "are"' % (char, amount))
        elif verb == 'are' and amount == '1':
            raise IntrospectionException('At "%s", 1 must use "is"' % char)

    amounts = {}
    for amount, verb, char in groups:
        if char in amounts:
            raise IntrospectionException('Duplicate "%s" found' % char)
        amounts[char] = int(amount)
    return amounts

def getActualAmounts(string):
    amounts = {}
    for char in string:
        if char in amounts:
            amounts[char] += 1
        else:
            amounts[char] = 1
    return amounts

def compareAmounts(claimed, actual):
    for char in actual:
        if char not in claimed:
            raise IntrospectionException('The amounts list is missing "%s"' % char)
    for char in actual: #loop separately so missing character errors are all found first
        if claimed[char] != actual[char]:
            raise IntrospectionException('The amount of "%s" characters is %d, not %d' % (char, actual[char], claimed[char]))
    if claimed != actual:
        raise IntrospectionException('The amounts are somehow incorrect')

def isCorrect(code, output):
    p1 = r'^My source has (\d+) characters\.\n'
    p2 = r'Besides unquoted numbers, my output has (\d+) characters\.\n'
    p3 = r"It's good to be a program\.$"
    p4 = '%s(%s)*%s(%s)*%s' % (p1, amountPattern, p2, amountPattern, p3)

    for p in [p1, p2, p3, p4]:
        if re.search(p, output, re.DOTALL) == None:
            raise IntrospectionException('Did not match the regex "%s"' % p)

    claimedCodeSize = int(re.search(p1, output).groups()[0])
    actualCodeSize = len(code)
    if claimedCodeSize != actualCodeSize:
        raise IntrospectionException('The code length is %d, not %d' % (actualCodeSize, claimedCodeSize))

    filteredOutput = re.sub(r'([^"])\d+([^"])', r'\1\2', output)

    claimedOutputSize = int(re.search(p2, output).groups()[0])
    actualOutputSize = len(filteredOutput)
    if claimedOutputSize != actualOutputSize:
        raise IntrospectionException('The output length (excluding unquoted numbers) is %d, not %d' % (actualOutputSize, claimedOutputSize))

    splitIndex = re.search(p2, output).start()

    claimedCodeAmounts = getClaimedAmounts(output[:splitIndex], False)
    actualCodeAmounts = getActualAmounts(code)
    compareAmounts(claimedCodeAmounts, actualCodeAmounts)

    claimedOutputAmounts = getClaimedAmounts(output[splitIndex:], True)
    actualOutputAmounts = getActualAmounts(filteredOutput)
    compareAmounts(claimedOutputAmounts, actualOutputAmounts)

def checkCorrectness():
    try:
        isCorrect(CODE, OUTPUT)
        print 'Everything is correct!'
    except IntrospectionException as e:
        print 'Failed: %s.' % e

checkCorrectness()

Scoring

This is code-golf. The submission with the fewest characters wins. Submissions must pass the format checker to be valid. Standard loopholes apply, though you may read your own source code and/or hardcode your output.

Calvin's Hobbies

Posted 2014-07-11T23:17:18.390

Reputation: 84 000

Is reading your own source file allowed? – Ventero – 2014-07-12T02:52:29.417

@MrLore There may be other errors but I just realized that the triple quotes (''') still escapes things with backslash. This may be related to your problem. I'm fixing it now. – Calvin's Hobbies – 2014-07-12T02:52:44.127

@Ventero Definitely! – Calvin's Hobbies – 2014-07-12T02:53:02.577

@MrLore The regexps allow some false positives, yes. To fix the problem with backslashes inside triple quotes, use raw strings (r'''CODE'''). – Ventero – 2014-07-12T02:53:32.990

@Calvin'sHobbies Meh. Could've saved a lot of effort then. Since you have tagged this as [tag:quine], I figured the somewhat-standard quine loophole wouldn't be allowed. – Ventero – 2014-07-12T02:54:34.447

Sorry about that. I imagined that even with reading your own source the task is still fairly tricky. – Calvin's Hobbies – 2014-07-12T02:57:34.700

1@MrLore Fixed unescaped dots. Thanks for noticing! – Calvin's Hobbies – 2014-07-12T03:06:01.520

"Besides unquoted numbers..." So we're playing on easy mode, huh? – algorithmshark – 2014-07-12T05:19:36.170

if the program outputs something related to its output, that's recursive. How can we avoid infinite output? – xem – 2014-07-12T06:39:08.867

@xem Well for one thing Ventero already proved it was possible. Also, for every unique character in the output (or code) the output can only increase by 10 characters (11 counting the output size going from 95 to 105, for example). There aren't infinitely many unique characters. – Calvin's Hobbies – 2014-07-12T06:59:06.937

Does the program have to analyze its own source and output, or can it just output the right stuff (with some hardcoding)? – aditsu quit because SE is EVIL – 2014-07-15T16:49:30.347

@aditsu Hardcoding things is fine. – Calvin's Hobbies – 2014-07-15T22:06:24.273

Answers

2

CJam - 189

{`"_~"+:T;"Besides unquoted numbers, my output has &It's good to be a program.&My source has & characters.
"'&/~_]:X2=T,X3=3i({T_&:B{TI/,(" are ":AM`I*N}fIXK=]o
XBA`N+f+2*+s:T,X3=}fK'q];}_~

Try it at http://cjam.aditsu.net/

Output:

My source has 189 characters.
3 are "{"
3 are "`"
6 are """
4 are "_"
3 are "~"
4 are "+"
5 are ":"
5 are "T"
2 are ";"
3 are "B"
8 are "e"
9 are "s"
2 are "i"
3 are "d"
17 are " "
6 are "u"
2 are "n"
2 are "q"
8 are "o"
6 are "t"
3 are "m"
2 are "b"
7 are "r"
4 are ","
2 are "y"
2 are "p"
3 are "h"
7 are "a"
5 are "&"
4 are "I"
3 are "'"
2 are "g"
2 are "."
2 are "M"
3 are "c"
2 are "
"
2 are "/"
3 are "]"
5 are "X"
2 are "2"
4 are "="
3 are "3"
2 are "("
2 are "A"
2 are "*"
2 are "N"
3 are "}"
3 are "f"
2 are "K"
Besides unquoted numbers, my output has 988 characters.
3 are "B"
108 are "e"
11 are "s"
3 are "i"
5 are "d"
214 are " "
8 are "u"
4 are "n"
3 are "q"
9 are "o"
9 are "t"
5 are "m"
4 are "b"
108 are "r"
3 are ","
4 are "y"
4 are "p"
6 are "h"
108 are "a"
3 are "I"
3 are "'"
4 are "g"
5 are "."
3 are "M"
7 are "c"
102 are "
"
2 are "{"
198 are """
2 are "`"
2 are "_"
2 are "~"
2 are "+"
2 are ":"
2 are "T"
2 are ";"
2 are "&"
2 are "/"
2 are "]"
2 are "X"
2 are "2"
2 are "="
2 are "3"
2 are "("
2 are "A"
2 are "*"
2 are "N"
2 are "}"
2 are "f"
2 are "K"
It's good to be a program.

aditsu quit because SE is EVIL

Posted 2014-07-11T23:17:18.390

Reputation: 22 326

11

Ruby, 269 (311, 367) characters

I have three different solutions for this challenge. Each of them uses a different set of tricks:

"Proper" solution, 367 characters:

The longest solution is more or less just a proof of concept that it's possible to solve this challenge without any tricks - and is not nearly fully golfed. It's a true quine (i.e. it generates its own source code instead of reading it from a file) and actually calculates all the numbers it prints (code length, output length, character occurences). Due to the way the quine works, all the code has to be on a single line and inside a string literal.

eval r="S='eval r=%p'%r;O=-~$.;q=\"My source has \#{S.size}\"+(X=' characters.\n')+S.chars.uniq.map{|c|[k=S.count(c),k>O ? :are: :is,?\"+c+?\"]*' '}*$/+'\nBesides unquoted numbers, my output has ';r=(w=q+X+s=\"It's good to be a program.\").scan(D=/\\D/).uniq;$><<q<<(w+v=r.map{|c|j=' are \"\n\"';(-~(w+j*r.size).count(c)).to_s+(j[~O]=c;j)}*$/+$/).scan(D).size<<X+v+s"

Partially hardcoded output, 311 characters:

The next shortest solution uses two tricks, but is still a true quine: - No character occurs exactly once in the source code. That way, I don't need to decide whether I should print is or are in the first half of the output. It also makes it a bit easier to calculate the total output size (though I don't actually need to do that). - The total output size is hardcoded. Since this only depends on the number of distinct characters in the source code (and in the general case, how many of those characters occur only once), it's easily to calculate it in advance.

Note that the code is preceeded by two very significant newlines, which StackExchange wouldn't show in the code block. For that reason, I have added an additional line in front if those newlines, which is not part of the code.

#


eval R="I=$/+$/+'eval R=%p'%R;?\\4>w='%d are \"%s\"';B=\"My source has \#{I.size}\#{X=\" characters.\n\"}\#{z=(m=I.chars.uniq).map{|x|w%[I.count(x),x]}*$/}\nBesides unquoted numbers, my output has 1114\"+X;$><<B+m.map{|c|w%[(B+z+$M=\"\nIt's good to be a program.\").gsub!(/\\d++(?!\")/,'').count(c),c]}*$/+$M"

Shortest solution, 269 characters:

The shortest solution additionally hardcodes its own source length. By using variable names that are/aren't already part of the source code, it's possible to find a "fixpoint" where all characters in the source code (including the digits from the hardcoded lengths!) occur at least twice.

This solution also saves a few more characters by simply reading its own source code from the code file, instead of generating it. As a nice side effect, this makes the code much more "readable" (but who cares about readable code in a ...), as now the code doesn't have to be inside a string literal anymore.

U='%d are "%s"'
O=IO.read$0
?\126>B="My source has 269#{X=" characters.
"}#{z=(m=O.chars.uniq).map{|c|U%[O.count(c),c]}*$/}
Besides unquoted numbers, my output has 1096"+X
$><<B+m.map{|c|U%[(B+z+$M="
It's good to be a program.").gsub!(/\d++(?!")/,"").count(c),c]}*$/+$M

I also modified the test script a little bit to reduce the copy-pasting necessary to check the code. By replacing the definitions of CODE and OUTPUT with

import subprocess

CODE = open("packed.rb").read()
OUTPUT = subprocess.check_output(["ruby", "packed.rb"])

print CODE
print len(CODE)

the script now automatically runs my code, reads its output, and grabs the source code from the code file.


Here's the output generated by the shortest code:

My source has 269 characters.
3 are "U"
7 are "="
3 are "'"
4 are "%"
6 are "d"
17 are " "
11 are "a"
9 are "r"
9 are "e"
11 are """
11 are "s"
6 are "
"
4 are "O"
2 are "I"
10 are "."
6 are "$"
2 are "0"
2 are "?"
2 are "\"
2 are "1"
2 are "2"
3 are "6"
2 are ">"
4 are "B"
3 are "M"
2 are "y"
9 are "o"
10 are "u"
12 are "c"
4 are "h"
2 are "9"
2 are "#"
4 are "{"
2 are "X"
8 are "t"
4 are "}"
2 are "z"
6 are "("
7 are "m"
5 are "n"
2 are "i"
2 are "q"
6 are ")"
4 are "p"
4 are "|"
2 are "["
4 are ","
2 are "]"
2 are "*"
4 are "/"
3 are "b"
7 are "+"
2 are "<"
3 are "g"
2 are "!"
Besides unquoted numbers, my output has 1096 characters.
2 are "U"
2 are "="
3 are "'"
2 are "%"
5 are "d"
238 are " "
120 are "a"
120 are "r"
120 are "e"
222 are """
11 are "s"
114 are "
"
2 are "O"
3 are "I"
5 are "."
2 are "$"
2 are "0"
2 are "?"
2 are "\"
2 are "1"
2 are "2"
2 are "6"
2 are ">"
3 are "B"
3 are "M"
4 are "y"
9 are "o"
8 are "u"
7 are "c"
6 are "h"
2 are "9"
2 are "#"
2 are "{"
2 are "X"
9 are "t"
2 are "}"
2 are "z"
2 are "("
5 are "m"
4 are "n"
3 are "i"
3 are "q"
2 are ")"
4 are "p"
2 are "|"
2 are "["
3 are ","
2 are "]"
2 are "*"
2 are "/"
4 are "b"
2 are "+"
2 are "<"
4 are "g"
2 are "!"
It's good to be a program.

Ventero

Posted 2014-07-11T23:17:18.390

Reputation: 9 842

Could you post a definitive copy of your code and output so I can easily test it? The code should not output itself and the output should end in a period not a newline. – Calvin's Hobbies – 2014-07-12T03:29:18.030

@Calvin'sHobbies The first code block is my actual code. It does print the output with a final newline though, so give me a few minutes to fix that (this is something that you should definitely mention in the spec). – Ventero – 2014-07-12T03:31:32.097

Sure thing, I just updated the spec. – Calvin's Hobbies – 2014-07-12T03:38:54.960

@Calvin'sHobbies Done. First code block is the actual code which is generated by the second code block (so that I don't have to take care of string escaping and everything while writing the code). – Ventero – 2014-07-12T03:39:40.847