Shell + coreutils, 6

Question

21

4

Note 2: I accepted @DigitalTrauma's 6-byte long answer. If anyone can beat that I will change the accepted answer. Thanks for playing!

Note: I will be accepting an answer at 6:00pm MST on 10/14/15. Thanks to all that participated!

I am very surprised that this has not been asked yet (or I didn't search hard enough). Either way, this challenge is very simple:

Input: A program in the form of a string. Additionally, the input may or may not contain:

Leading and trailing spaces
Trailing newlines
Non-ASCII characters

Output: Two integers, one representing UTF-8 character count and one representing byte count, you may choose which order. Trailing newlines are allowed. Output can be to STDOUT or returned from a function. IT can be in any format as long as the two numbers are distinguishable from each other (2327 is not valid output).

Notes:

You may consider newline as \n or \r\n.
Here is a nice byte & character counter for your tests. Also, here is a meta post with the same thing (Thanks to @Zereges).

Sample I/O: (All outputs are in the form {characters} {bytes})

Input: void p(int n){System.out.print(n+5);}

Output: 37 37

Input: (~R∊R∘.×R)/R←1↓ιR

Output: 17 27

Input:


friends = ['john', 'pat', 'gary', 'michael']
for i, name in enumerate(friends):
    print "iteration {iteration} is {name}".format(iteration=i, name=name)

Output: 156 156

This is code golf - shortest code in bytes wins!

Leaderboards

Here is a Stack Snippet to generate both a regular leaderboard and an overview of winners by language.

To make sure that your answer shows up, please start your answer with a headline, using the following Markdown template:

# Language Name, N bytes

where N is the size of your submission. If you improve your score, you can keep old scores in the headline, by striking them through. For instance:

# Ruby, <s>104</s> <s>101</s> 96 bytes

If there you want to include multiple numbers in your header (e.g. because your score is the sum of two files or you want to list interpreter flag penalties separately), make sure that the actual score is the last number in the header:

# Perl, 43 + 2 (-p flag) = 45 bytes

You can also make the language name a link which will then show up in the leaderboard snippet:

# [><>](http://esolangs.org/wiki/Fish), 121 bytes

var QUESTION_ID=60733,OVERRIDE_USER=36670;function answersUrl(e){return"http://api.stackexchange.com/2.2/questions/"+QUESTION_ID+"/answers?page="+e+"&pagesize=100&order=desc&sort=creation&site=codegolf&filter="+ANSWER_FILTER}function commentUrl(e,s){return"http://api.stackexchange.com/2.2/answers/"+s.join(";")+"/comments?page="+e+"&pagesize=100&order=desc&sort=creation&site=codegolf&filter="+COMMENT_FILTER}function getAnswers(){jQuery.ajax({url:answersUrl(answer_page++),method:"get",dataType:"jsonp",crossDomain:!0,success:function(e){answers.push.apply(answers,e.items),answers_hash=[],answer_ids=[],e.items.forEach(function(e){e.comments=[];var s=+e.share_link.match(/\d+/);answer_ids.push(s),answers_hash[s]=e}),e.has_more||(more_answers=!1),comment_page=1,getComments()}})}function getComments(){jQuery.ajax({url:commentUrl(comment_page++,answer_ids),method:"get",dataType:"jsonp",crossDomain:!0,success:function(e){e.items.forEach(function(e){e.owner.user_id===OVERRIDE_USER&&answers_hash[e.post_id].comments.push(e)}),e.has_more?getComments():more_answers?getAnswers():process()}})}function getAuthorName(e){return e.owner.display_name}function process(){var e=[];answers.forEach(function(s){var r=s.body;s.comments.forEach(function(e){OVERRIDE_REG.test(e.body)&&(r="<h1>"+e.body.replace(OVERRIDE_REG,"")+"</h1>")});var a=r.match(SCORE_REG);a&&e.push({user:getAuthorName(s),size:+a[2],language:a[1],link:s.share_link})}),e.sort(function(e,s){var r=e.size,a=s.size;return r-a});var s={},r=1,a=null,n=1;e.forEach(function(e){e.size!=a&&(n=r),a=e.size,++r;var t=jQuery("#answer-template").html();t=t.replace("{{PLACE}}",n+".").replace("{{NAME}}",e.user).replace("{{LANGUAGE}}",e.language).replace("{{SIZE}}",e.size).replace("{{LINK}}",e.link),t=jQuery(t),jQuery("#answers").append(t);var o=e.language;/<a/.test(o)&&(o=jQuery(o).text()),s[o]=s[o]||{lang:e.language,user:e.user,size:e.size,link:e.link}});var t=[];for(var o in s)s.hasOwnProperty(o)&&t.push(s[o]);t.sort(function(e,s){return e.lang>s.lang?1:e.lang<s.lang?-1:0});for(var c=0;c<t.length;++c){var i=jQuery("#language-template").html(),o=t[c];i=i.replace("{{LANGUAGE}}",o.lang).replace("{{NAME}}",o.user).replace("{{SIZE}}",o.size).replace("{{LINK}}",o.link),i=jQuery(i),jQuery("#languages").append(i)}}var ANSWER_FILTER="!t)IWYnsLAZle2tQ3KqrVveCRJfxcRLe",COMMENT_FILTER="!)Q2B_A2kjfAiU78X(md6BoYk",answers=[],answers_hash,answer_ids,answer_page=1,more_answers=!0,comment_page;getAnswers();var SCORE_REG=/<h\d>\s*([^\n,]*[^\s,]),.*?(\d+)(?=[^\n\d<>]*(?:<(?:s>[^\n<>]*<\/s>|[^\n<>]+>)[^\n\d<>]*)*<\/h\d>)/,OVERRIDE_REG=/^Override\s*header:\s*/i;

body{text-align:left!important}#answer-list,#language-list{padding:10px;width:290px;float:left}table thead{font-weight:700}table td{padding:5px}

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script> <link rel="stylesheet" type="text/css" href="//cdn.sstatic.net/codegolf/all.css?v=83c949450c8b"> <div id="answer-list"> <h2>Leaderboard</h2> <table class="answer-list"> <thead> <tr><td></td><td>Author</td><td>Language</td><td>Size</td></tr></thead> <tbody id="answers"> </tbody> </table> </div><div id="language-list"> <h2>Winners by Language</h2> <table class="language-list"> <thead> <tr><td>Language</td><td>User</td><td>Score</td></tr></thead> <tbody id="languages"> </tbody> </table> </div><table style="display: none"> <tbody id="answer-template"> <tr><td>{{PLACE}}</td><td>{{NAME}}</td><td>{{LANGUAGE}}</td><td>{{SIZE}}</td><td><a href="{{LINK}}">Link</a></td></tr></tbody> </table> <table style="display: none"> <tbody id="language-template"> <tr><td>{{LANGUAGE}}</td><td>{{NAME}}</td><td>{{SIZE}}</td><td><a href="{{LINK}}">Link</a></td></tr></tbody> </table>

GamrCorps

Posted 2015-10-14T01:54:16.973

Reputation: 7 058

does the output have to be space-separated? – Maltysen – 2015-10-14T03:24:15.803

no, it can be in any format as long as the numbers are distinguishable from each other (2327 is not valid output) – GamrCorps – 2015-10-14T03:25:10.017

Aren't there some UTF-8 characters that depending on the interpretation can be split into two other characters that generate the same byte values? How do we count those then? – Patrick Roberts – 2015-10-14T03:51:33.740

Honestly, I do not know what you mean. Therefore, count as you wish. – GamrCorps – 2015-10-14T03:52:15.280

@GamrCorps UTF-8 characters include non-ASCII characters, which are basically characters that cannot be represented by one byte but must be represented by two or even four bytes. Depending on how the characters are read in by a program, it is up to the program to choose how to interpret the stream of bytes. For example, a 2 byte UTF-8 can be interpreted as 2 sequential ASCII characters each of which are represented by the two bytes making up the originally intended character. – Patrick Roberts – 2015-10-14T03:56:04.350

@PatrickRoberts I would say to use the higher value. But my final judgement would have to go to whatever https://mothereff.in/byte-counter says. Just put a questionable charatcer in there and see what it reads as, and use that as the foundation.

– GamrCorps – 2015-10-14T04:02:50.083

http://meta.codegolf.stackexchange.com/questions/4944/byte-counter-snippet – Zereges – 2015-10-14T07:56:40.690

Some of the answers count the character `` as two characters due (presumably) to the use of UTF-16 and its surrogate pairs rather than UTF-8. (Note that the byte count will be the same either way.) To confirm, since you've specified UTF-8 specifically, that makes such answers invalid, correct? – Alex A. – 2016-02-05T07:18:02.580

@AlexA. Yes. If answers count characters based on a non-UTF-8 encoding, the answer would be invalid. – GamrCorps – 2016-02-05T13:08:22.553

Nitpick: There's no such thing as a UTF-8 character. UTF-8 is an encoding that permits us to store Unicode characters as byte sequences. You are asking for the character and byte count of a UTF-8 data stream. – Dennis – 2016-02-05T18:50:53.247

Count the bytes of a program

Leaderboards

Answers

Shell + coreutils, 6

Test output:

Shell + coreutils, 12

GolfScript, 14 12 bytes

Idea

Code

Python, 42 40 bytes

Julia, 24 bytes

Rust, 42 bytes

Pyth - 12 9 bytes

Java, 241 90 89 bytes

PowerShell, 57 bytes

R, 47 bytes

R, 52 bytes

C, 68 67 bytes

Milky Way 1.6.2, 7 bytes (non-competing)

Explanation

Usage

Perl 6, 33 bytes

Brainfuck, 163 bytes

beeswax, 99 87 bytes

Swift 3, 37

Usage