Count the Bytes!

6

2

Challenge

Write a program which takes the URL of a PPCG answer as input and outputs the length of the answer's first code block in bytes.

Code Blocks

There are three different ways of formatting code blocks:

Four spaces before the code (- represents a space, here)

----Code goes here

Surrounding the code in <pre> tags:

<pre>Code goes here</pre>

or:

<pre>
Code goes here
</pre>

Surrounding the code in <pre><code> tags (in that order):

<pre><code>Code goes here</code></pre>

Rules

You should only use the first code block in answer. Assume that the supplied answer will always have a code block.

You must find the length of the answer in bytes, not any other metric such as characters.

Solutions using the Stack Exchange Data Explorer or the Stack Exchange API are allowed. URL shorteners are disallowed.

Assume all programs use UTF-8 encoding.

Examples

Input:  https://codegolf.stackexchange.com/a/91904/30525
Output: 34

Link


Input:  https://codegolf.stackexchange.com/a/91693/30525
Output: 195

Link


Input:  https://codegolf.stackexchange.com/a/80691/30525
Output: 61

Link

Note: Despite what the answer may say, due to APL using a different encoding, 33 bytes isn't its UTF-8 byte count

Winning

The shortest code in bytes wins.

Beta Decay

Posted 2016-09-01T22:03:36.183

Reputation: 21 478

So no scraping the header? – Blue – 2016-09-01T22:04:35.573

@muddyfish Nope (note the third example) – Beta Decay – 2016-09-01T22:05:46.330

Some languages use a custom character encoding, different than UTF-8 (I'm pretty sure that's what happens in the APL answer). So the byte count depends on each language's default encoding. I think it may be better for the challenge to ask for character count rather than byte count – Luis Mendo – 2016-09-01T22:19:44.813

@LuisMendo I see, I'll specify UTF-8 then – Beta Decay – 2016-09-01T22:20:57.493

For example 2 I can only find 195 bytes. It looks like the 199 to 196 edit actually removed 4 bytes, not 3. Am I missing something? – milk – 2016-09-02T00:06:58.647

2Identation is meaningful in <pre> tags, and so are newlines in <code> tags, so your Code goes here examples have three different lengths. If that's intentional, it should be mentioned. – Dennis – 2016-09-02T00:09:04.660

@Dennis I've now removed the indentation – Beta Decay – 2016-09-02T05:26:59.580

But not the newlines. Both newlines in <pre><code>\nCode goes here\n</code></pre> count. However, they don't count in <pre>\nCode goes here\n</pre> – Dennis – 2016-09-02T05:49:20.147

@Dennis Oh, I didn't know that :P – Beta Decay – 2016-09-02T05:51:36.967

Answers

3

jQuery JavaScript, 97 Bytes

console.log((new Blob([$("#answer-"+location.hash.substr(1)+" code").eq(0).text().trim()])).size)

old version 151 Bytes

s=$('#answer-'+location.hash.substr(1)+' code').eq(0).text().trim();c=0;for(i=s.length;i--;)d=s.charCodeAt(i),c=d<128?c+1:d<2048?c+2:c+3;console.log(c)

old version 163 Bytes

s=$('#answer-'+location.hash.substr(1)+' code').first().text().trim();c=0;for(i=0;i<s.length;i++){d=s.charCodeAt(i);c=(d<128)?c+1:(d<2048)?c+2:c+3;}console.log(c);

input ist location and jQuery is active on the site

Jörg Hülsermann

Posted 2016-09-01T22:03:36.183

Reputation: 13 026

Move the c=0 into the for, to save a byte. for(c=i=0;i<s.lengh;i++) – Paul Schmitz – 2016-09-02T07:42:02.890

And remove the ; before and after the console.log(c). – Paul Schmitz – 2016-09-02T08:19:27.703

@PaulSchmitz Thank You. I prefer to use a while loop instead of your proposal. – Jörg Hülsermann – 2016-09-02T08:52:34.397

1s=$("#answer-"+location.hash.substr(1)+" code").eq(0).text().trim();c=0;for(i=s.length;i--;)d=s.charCodeAt(i),c=128>d?c+1:2048>d?c+2:c+3;console.log(c) is 5 bytes shorter. 4 for removing () around the conditions, and 1 by replacing while with for. – Paul Schmitz – 2016-09-02T08:58:08.470

4

C#, 249 270 bytes

u=>{var c=new System.Net.Http.HttpClient().GetStringAsync(u).Result;var i=c.IndexOf("de>",c.IndexOf('"'+u.Split('/')[4]))+3;return System.Text.Encoding.UTF8.GetByteCount(c.Substring(i,c.IndexOf("<",i)-i-1).Replace("&gt;",">").Replace("&lt;","<").Replace("&amp;","&"));};

I've made some assumptions based on observations of the HTML of pages. Hopefully they hold true for all answer posts.

+21 bytes to unencode &amp; to &. My answer failed to count it's own bytes. ;P

Ungolfed:

/*Func<string, int> Lambda =*/ u => 
{
    // Download HTML of the page.
    // Although new System.New.WebClient().DownloadString(u) looks shorter, it doesn't 
    // preserve the UTF8 encoding.
    var c = new System.Net.Http.HttpClient().GetStringAsync(u).Result;

    // Using the answer id from the URL, find the index of the start of the code block.
    // Empirical observation has found that all code blocks are in <code> tags,
    // there are no other HTML tags that end with "de>",
    // and that '>' and '<' are encoded to '&gt;'/'&lt;' so no jerk can put "de>" 
    // before the first code block.
    var i = c.IndexOf("de>", c.IndexOf('"' + u.Split('/')[4])) + 3;

    // Get the substring of the code block text, unencode '>'/'<'/'&' and get the byte count.
    // Again, empirical observation shows the closing </code> tag is always on a new
    // line, so always remove 1 character when getting the substring.
    return System.Text.Encoding.UTF8.GetByteCount(c.Substring(i, c.IndexOf("<", i) - i - 1).Replace("&gt;", ">").Replace("&lt;", "<")..Replace("&amp;", "&"));
};

Results:

Input:  http://codegolf.stackexchange.com/a/91904/30525
Output: 34

Input:  http://codegolf.stackexchange.com/a/91693/30525
Output: 195 (at the time of this answer the first post has 196, but I can't find the 196th 
byte, even counting by hand, so assuming 195 is correct)

Input:  http://codegolf.stackexchange.com/a/80691/30525
Output: 61

Input:  http://codegolf.stackexchange.com/a/91995/58106 (this answer)
Output: 270

milk

Posted 2016-09-01T22:03:36.183

Reputation: 3 043

I find 195 Bytes too – Jörg Hülsermann – 2016-09-02T01:17:30.947

Thanks, I've edited the question and my answer with 195 bytes – Beta Decay – 2016-09-02T05:23:31.237

2

Java 7, 314 313 bytes

int c(String u)throws Exception{String x="utf8",s=new java.util.Scanner(new java.net.URL(u).openStream(),x).useDelimiter("\\A").next();int j=s.indexOf("de>",s.indexOf('"'+u.split("/")[4]))+3;return s.substring(j,s.indexOf("<",j)-1).replace("&gt;",">").replace("&lt;","<").replace("&amp;","&").getBytes(x).length;}

Shamelessly stolen from @milk's C# answer and ported to Java 7.
NOTE: This assumes all code is in <code> blocks. Currently it won't work with just <pre> tags (but who uses those anyway?.. ;p).

Ungolfed & test cases:

class M{
  static int c(String u) throws Exception{
    String x = "utf8",
           s = new java.util.Scanner(new java.net.URL(u).openStream(), x).useDelimiter("\\A").next();
    int j = s.indexOf("de>", s.indexOf('"'+u.split("/")[4])) + 3;
    return s.substring(j, s.indexOf("<", j) - 1).replace("&gt;", ">").replace("&lt;", "<").replace("&amp;", "&")
        .getBytes(x).length;
  }

  public static void main(String[] a) throws Exception{
    System.out.println(c("https://codegolf.stackexchange.com/a/91904/30525"));
    System.out.println(c("https://codegolf.stackexchange.com/a/91693/30525"));
    System.out.println(c("https://codegolf.stackexchange.com/a/80691/30525"));
    System.out.println(c("https://codegolf.stackexchange.com/a/91995/58106"));
  }
}

Output:

34
195
61
270

Kevin Cruijssen

Posted 2016-09-01T22:03:36.183

Reputation: 67 575