Count the characters - bit by bit!

19

3

The simple part: Given an input string containing only printable ASCII-characters (space - tilde), count the number of occurrences of each character and return the result on any convenient format. The result for a string a%hda7a should be something like: a:3, %:1, h:1, 7:1, d:1. Sorting is unnecessary, the delimiters and formats are optional but it must be easily understood which number corresponds to which character. You shall not include characters that are not in the input string (a:3, b:0, c:0, d:1, ... is not OK).

The real challenge:

Convert every character in your code to an 8-bit binary number (or 16-bit if you're using UTF-16 or similar), and enumerate every character starting at 0.

For every character (i is the enumerator), the i%7-bit1 must be 1. The bits are numbered from the right. All other bits can be whatever you want.

Let's use the following code as an example:

[f]-xif)#f

Converting this to binary we get the array below. The first number (representing [ has a 1 in the 0'th position, so that one is OK. The second number (representing f has a 1 in the 1'st position, so that one is OK too. Continue like this, and you'll see that the code above is valid.

C  76543210  Bit number
-  --------  ----------
[  01011011  0   - OK
f  01100110  1   - OK
]  01011101  2   - OK
-  00101101  3   - OK
x  01111000  4   - OK
i  01101001  5   - OK
f  01100110  6   - OK
)  00101001  0   - OK
#  00100011  1   - OK
f  01100110  2   - OK

If we change the code into: ]f[-xif)#f we'll get the following start of the sequence:

C  76543210  Bit number
-  --------  ----------
]  01011101  0   <- OK
f  01100110  1   <- OK
[  01011011  2   <- Not OK
-  00101101  3   <- OK

As we see, the third character [ doesn't have a 1 in the 2nd position (zero-indexed), and this code is therefore not valid.

Test cases:

Input:
This is a string containing some symbols: ".#!".#&/#

Output:
   !  "  #  &  /  :  T  a  b  c  e  g  h  i  l  m  n  o  r  s  t  y  .
7  1  2  3  1  1  1  1  2  1  1  1  2  1  5  1  2  4  3  1  6  2  1  2

Any reasonable output format is OK (whatever is most convenient to you). You could for instance have: :7, !:1, ":2, #:3, &:1, /:1, T:1, a:2 ... or [ ,7][!,1][",2][#,3][&,1].... The output is on any standard way (return from function, printed to STDOUT etc.)

1i modulus 7.


This is , so the shortest code in bytes will winref.

Stewie Griffin

Posted 2016-12-23T11:22:41.597

Reputation: 43 471

6

To be of a little assistance, here are the characters you're allowed to use in the n%7th spot > http://pastie.org/pastes/10985263/text

– TidB – 2016-12-23T12:26:32.633

@TidB the website is offline?? – Rod – 2016-12-23T13:07:32.023

1

@Rod Yeah, pastie seems to have some problems. Try pastebin instead

– TidB – 2016-12-23T13:10:43.963

1Remember that newline is 00001010. It can be useful too! :) – Stewie Griffin – 2016-12-23T13:12:33.413

Yeah, I only included the printable ASCII characters that you can identify quickly. Tabs, newlines etc. aren't included, not speaking about the extensive charset of esoteric languages :D – TidB – 2016-12-23T13:13:50.067

1

To be of some further assistance, here's a validation script you can use for UTF-8 encodings. Just encapsulate the input in a string like the example.

– AdmBorkBork – 2016-12-23T14:19:11.987

How far can we take the reasonable output format? E.g., all counts newline separated followed by a "key" of all the characters in order of their counts? – Sanchises – 2016-12-23T14:54:22.613

If you have 90 different input characters then it will be hard to tell which key belongs to which character, so I think I must say no to that one @Sanchises. – Stewie Griffin – 2016-12-23T15:01:08.563

OK thanks. I think my current submission should be fine then. – Sanchises – 2016-12-23T16:23:27.233

Apologies for the lacking replies, I'm celebrating Christmas with the family and code golf is not the main priority right now. Will be back :-) – Stewie Griffin – 2016-12-24T09:13:08.843

Answers

6

Pyke, 1 6 bytes

1cn;1c

Try it here!

1c     - chunk(size=1, input)
  n;1  - noop. 
     c - count(^)

Half of this code is just no-ops...

00110001 - 1
01100011 - c
01101110 - n
00111011 - ;
00110001 - 1
01100011 - c

Blue

Posted 2016-12-23T11:22:41.597

Reputation: 26 661

@EriktheOutgolfer has a valid point. I don't think this input format is valid, unless this is actually a regular string in Pyke. It would be a valid input string in MATLAB/Octave since 'abc'==['a','b','c'], so it might be in Pyke too...? – Stewie Griffin – 2016-12-23T13:20:00.997

@StewieGriffin This is not how Pyke normally handles strings. If that's not OK, I can see about switching the input format but as a character list is under the accepted list of defaults though this may count as cheating under that – Blue – 2016-12-23T13:59:19.103

@EriktheOutgolfer http://meta.codegolf.stackexchange.com/a/2216/32686 or for a direct but less voted answer http://meta.codegolf.stackexchange.com/a/8963/32686

– Blue – 2016-12-23T14:36:16.910

@muddyfish Oh, so that's where they were hiding. OK, I approve of your solution. Can I clarify that into the post? (It has confused at least 2 people) – Erik the Outgolfer – 2016-12-23T14:40:40.870

5Sorry for breaking your challenge with a 1-byte built-in I don't think you're actually sorry, and the challenge is not broken by this :-) – Luis Mendo – 2016-12-23T15:14:42.580

@LuisMendo and that matters why? :) – Blue – 2016-12-23T15:27:45.720

2This isn't a character list; it's a list of strings. While characters lists are at +17/-0, string lists are at +2/-2, so it's hardly an accepted default. @StewieGriffin should decide whether it's valid or not. – Dennis – 2016-12-23T16:13:31.407

@Dennis I did ping them earlier to ask if they thought it was valid – Blue – 2016-12-23T16:47:12.430

1@StewieGriffin better? – Blue – 2016-12-24T11:36:12.243

@muddyfish, better :) – Stewie Griffin – 2016-12-29T14:31:39.417

6

Pyth, 12 8 7 bytes

-1 byte thanks to @Loovjo

m+d/Qd{
      { # remove all duplicated elements from the (implicit) input
m       # map each element (d) of the parameter (the set from previous operation)
   /Qd  # count the occurrences of d in Q
 +d     # concatenate with d

binary representation

01101101 m
00101011 +
01100100 d
00101111 /
01010001 Q
01100100 d
01111011 {

Try here

Rod

Posted 2016-12-23T11:22:41.597

Reputation: 17 588

Nice! :) The output 13 for 111 looks strange, but it can't be misunderstood (there can't be any single character 13 that's used 1 time), so this is perfectly valid! – Stewie Griffin – 2016-12-23T12:33:07.320

4

Befunge-93, 150 bytes

={<{p+}3/}*77\%*7{7:\+{}{1g}+3/*77\%*7{7:}=:_{}{}={}{}{v#{}{}`x1:~
}-}=*}{2*}97}:<$}={$_v#}!:-*84g+3/*77\%*7{7:}=:}:}+}1{}<_{@#
}{}{}={}{}{}={^.},\={<

Try it online!

I started by writing this as a regular Befunge program, which I golfed as much as possible. I then added padding to make sure the various characters in the program only appeared in permitted positions. This padding relied on the fact that unsupported commands are ignored in Befunge-93, so I just needed a sequence of unused characters whose bits aligned with the positions required (the sequence I used was ={}{}{}).

The complicated bit was that the various branches between lines needed to line up correctly (e.g. the v arrow in one line, would need to line up with the < arrow below it). This was further complicated by the fact that the bridge command (#) could not be separated from its adjacent branching arrow. Initially I tried generating the padding programatically, but in the end it was mostly a manual process.

Because of the size of the program I'm not going to list the full character analysis, but this is a sample from the beginning and the end:

= 00111101 0
{ 01111011 1
< 00111100 2
{ 01111011 3
p 01110000 4
+ 00101011 5
} 01111101 6
3 00110011 0
/ 00101111 1
...
{ 01111011 1
^ 01011110 2
. 00101110 3
} 01111101 4
, 00101100 5
\ 01011100 6
= 00111101 0
{ 01111011 1
< 00111100 2

The line breaks are treated as a newline characters, so will either be in position 1 or 3.

James Holderness

Posted 2016-12-23T11:22:41.597

Reputation: 8 298

3

MATL, 17 bytes

u"G91x@=zD91x@uRD

Displays the count, then the corresponding character, all newline-separated. Biggest difficulty is the @ which is 0b01000000; I hope I can find a way to make do without it.

Try it online!

Explanation:

u"  % Implicit input. Take (u)nique characters and loop (") over them.
G   % Take the input a(G)ain
91x % Filler: push 91, delete immediately.
@   % Push current character of loop
=   % Check for equality with earlier G
z   % Count number of equal characters
D   % Display
91x % More filler!
@   % Get loop character again
uR  % Filler: two NOPs for the single-character @
D   % Display. Implicitly end loop.

MATL, 15 bytes (questionable output)

If just leaving two row vectors on the stack is allowed (function-like behaviour as per this Meta post), we can get down to

u"G91x@=zv]v!Gu

But here, the output is not quite as neatly ordered.

Sanchises

Posted 2016-12-23T11:22:41.597

Reputation: 8 530

The stack is implicitly printed at the end of the program and the output format is flexible as per the challenge, so I don't see any problem with the second approach – Luis Mendo – 2016-12-23T17:08:00.620

@LuisMendo I'm not sure. If you have 90 different input characters then it will be hard to tell which key belongs to which character, so I think I must say no to that one Sanchises. – Stewie Griffin 2 hours ago was the reply to a proposed hybrid (counts individually D'd, Gu at the end of the program), and I'm not sure if the 15-byte version is sufficiently different. – Sanchises – 2016-12-23T17:18:14.803

@StewieGriffin Could you perhaps see if the 15-byte version (Try it online!) is OK or not?

– Sanchises – 2016-12-23T17:20:08.770

Not sure it Stewie will get the ping on this post, better use the challenge post – Luis Mendo – 2016-12-23T17:30:46.353

Don't know about you, but I don't think it's easily understood here :) I prefer the 17 byte solution, but feel free to keep the 15 byte one in the answer too! Nice answer by the way :)

– Stewie Griffin – 2016-12-29T14:37:23.643

@StewieGriffin Thanks! I'll just keep it like this. I agree that the 15 byte version is difficult to read, but it does nicely leave both vectors on the stack, making this much like a 'function' - but the challenge wording does seem to prefer stdout-like output. Thanks for a nice puzzle though! – Sanchises – 2016-12-29T15:14:13.767

1

CJam, 14 bytes

q__|_ @sfe=]zp

Try it here.

The space before the @ and the s after it are filler characters inserted to make the ASCII codes fit the required pattern: the space does nothing, and the s just converts a string into a string. Other than that, this is a pretty simple and straightforward implementation of the challenge task:

q_               "read the input and make a copy of it";
  _|             "collapse repeated characters in the copy"; 
    _            "save a copy of the collapsed string";
      @          "pull the original input string to the top of the stack";
       s         "(does nothing here)";
        fe=      "for each character in the collapsed string, count the ...";
                 "... number of times it occurs in the original string";
           ]z    "pair up the counts with the saved copy of the collapsed string";
             p   "print the string representation of the result";

For the input foobar123, this code outputs [['f 1] ['o 2] ['b 1] ['a 1] ['r 1] ['1 2] ['2 2] ['3 1]]. If simply printing the counts on one line and the corresponding characters on another, as in:

[1 2 1 1 1 2 2 1]
fobar123

is considered an acceptable output format, then the ]z may be omitted to save two bytes, for a total of 12 bytes. Yes, the shortened code will still pass the bit pattern requirement.

Ps. I also wrote a simple source code checker for this challenge. Given a line of code as input, it will first echo that line and then print the same line with each character replaced by its (n % 7)-th ASCII bit. If the second line is all ones, the input is valid.

Ilmari Karonen

Posted 2016-12-23T11:22:41.597

Reputation: 19 513

1

Jelly, 6 bytes in Jelly's codepage

ṢZṢṀŒr

Try it online!

This is a function that returns a list of (character, count) pairs. (Jelly represents such lists as text, e.g. if they're sent to standard output, by concatenating the elements, which is why you have to treat this as a function rather than a complete program. (Here's the same program with some code appended to call the function and then print the internal structure to standard output, proving that the output's in an unambiguous format.)

Binary representation and explanation:

  76543210 

Ṣ 10110111   Sort the characters of the input
Z 01011010   Transpose the list (it's 1D, so this effectively wraps it in a list)
Ṣ 10110111   Sort the list (a no-op, as it has only one element)
Ṁ 11001000   Take the largest (i.e. only) element
Π00010011   First byte of a two-byte command
r 01110010   Run-length encode

It can be seen that the second, third, and fourth characters cancel each other out and are only there to maintain the bit pattern we need. Œr is just too convenient, though, and padding the program so that we can use it probably gives us a shorter program than trying to solve the problem without the builtin would.

user62131

Posted 2016-12-23T11:22:41.597

Reputation: