Implement an encoder for RFC 1924 IPv6 Addresses

6

1

Introduction

Everyone should know IPv6 by now. IPv6 has very long, cumbersome addresses. In 1996 some very intelligent people created a scheme to encode them much better in RFC 1924. For reasons I cannot understand, this hasn't been widely adopted. To help adoption, it is your task to implement a converter in the language of your choice. Because storage is very expensive, especially in source code repositories, you obviously have to use as little of it as possible.

Challenge

Write an encoder for the address format specified in RFC1924. The input is a IPv6 address in the boring standard format.

Stolen from Wikipedia this is:


An IPv6 address is represented as eight groups of four hexadecimal digits, each group representing 16 bits (two octets). The groups are separated by colons (:). An example of an IPv6 address is:

2001:0db8:85a3:0000:0000:8a2e:0370:7334

The hexadecimal digits are case-insensitive, but IETF recommendations suggest the use of lower case letters. The full representation of eight 4-digit groups may be simplified by several techniques, eliminating parts of the representation.

Leading zeroes

Leading zeroes in a group may be omitted. Thus, the example address may be written as:

2001:db8:85a3:0:0:8a2e:370:7334

Groups of zeroes

One or more consecutive groups of zero value may be replaced with a single empty group using two consecutive colons (::). Thus, the example address can be further simplified:

2001:db8:85a3::8a2e:370:7334

The localhost (loopback) address, 0:0:0:0:0:0:0:1, and the IPv6 unspecified address, 0:0:0:0:0:0:0:0, are reduced to ::1 and ::, respectively. This two-colon replacement may only be applied once in an address, because multiple occurrences would create an ambiguous representation.


You must handle omitted zeroes in the representation correctly!

The output is a IPv6 address in the superior RFC1924 format.

As in the RFC this is:


The new standard way of writing IPv6 addresses is to treat them as a 128 bit integer, encode that in base 85 notation, then encode that using 85 ASCII characters. The character set to encode the 85 base85 digits, is defined to be, in ascending order:

         '0'..'9', 'A'..'Z', 'a'..'z', '!', '#', '$', '%', '&', '(',
         ')', '*', '+', '-', ';', '<', '=', '>', '?', '@', '^', '_',
         '`', '{', '|', '}', and '~'.

The conversion process is a simple one of division, taking the remainders at each step, and dividing the quotient again, then reading up the page, as is done for any other base conversion.


You can use the suggested conversion algorithm in the RFC or any other you can think of. It just has to result in the correct output!

Example Input and Output

Here is the one from the RFC:

Input:

1080:0:0:0:8:800:200C:417A or 1080::8:800:200C:417A

Output:

4)+k&C#VzJ4br>0wv%Yp

Your program can take input and return the output in any standard way. So it is ok to read from stdin and output to stdout or you can implement your code as a function that takes the input as argument and returns the output.

Winning criteria

As always standard loopholes are disallowed

This is so shortest correct code in bytes wins!

This is my first challenge, so please do notify me about any problems/ambiguities in the description.

Josef says Reinstate Monica

Posted 2016-01-27T09:40:09.047

Reputation: 161

Is inet_pton allowed? – kennytm – 2016-01-27T11:52:57.077

Functions from the standard library of your language are allowed. I guess that qualifies, so yes it is allowed. – Josef says Reinstate Monica – 2016-01-27T14:56:26.197

Related – Digital Trauma – 2016-01-27T16:11:37.443

Out of curiousity, is For reasons I cannot understand intended to be sarcastic or genuine? – Not that Charles – 2016-01-27T20:53:30.413

@NotthatCharles it is intended to be sarcastic. Most people will understand why that's not widely used after seeing one address in that format and I am one of them. I also understand why IPoAC is not widely used, but I would like it to be. In general, I just think the first of April RFCs are great.

– Josef says Reinstate Monica – 2016-01-28T07:32:04.440

Answers

2

Pyth, 93 91 bytes

Ks[jkUTrG1G"!#$%&()*+-;<=>?@^_`{|}~")$from ipaddress import*;z=int(ip_address(z))$sm@Kdjz85

Second submission in Pyth!

Can't try online because $ is unsafe.

When tested against the example, produces the expected result:

$ pyth -c 'Ks[jkUTrG1G"!#$%&()*+-;<=>?@^_`{|}~")$from ipaddress import *$$z=int(ip_address(z))$sm@Kdjz85'
1080::8:800:200C:417A
4)+k&C#VzJ4br>0wv%Yp

Edits:

  • Saved two bytes by replacing $$ with ; and removing space before *

Explanation

First part creates the mapping string and assigns it to K:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!#$%&()*+-;<=>?@^_`{|}~

This is done as follows:

Ks[jkUTrG1G"!#$%&()*+-;<=>?@^_`{|}~")
K                                       implicitly assign to K
 s[                                 )   the string joined from the following list
     UT                                 range [0..9]
   jk                                   joined with the empty string: 0123456789
       rG1                              capitalized version of the string G (whole alphabet)
          G                             G itself
           "!#$%&()*+-;<=>?@^_`{|}~"    all other symbols (shorter to declare than generate)

Then, the second part is two pieces of normal python amounting to 47 bytes, to convert the input ipv6 address to a number (because z is accessed later in the code, it is first implicitly assigned to any input value before running the Python code)

*from ipaddress import *$$z=int(ip_address(z))$

Then, the last part converts the resulting number to base 85 as a list and maps every value to the mapping string K:

sm@Kdjz85
     jz85  convert z to base 85
 m         map over the converted number with lambda d:
  @Kd      K[d]
s          join as a string and implicitly print

Pure Pyth, 100 bytes

Ks[jkUTrG1G"!#$%&()*+-;<=>?@^_`{|}~")IgJx=zcz\:kZ=z-z]k=z.n++<zJ]*-8lz]\0>zJ)=zimid16z65536sm@Kdjz85

You can try it here

In order to support leading/middle/trailing ::, I must insert a list of missing zeroes in the correct position in the ip once converted to a list of hex numbers, which takes up more space than the inline Python version. There's room for improvement but I doubt it can beat the other version!

63 bytes version

Doesn't support the :: notation to skip zeroes:

Ks[jkUTrG1G"!#$%&()*+-;<=>?@^_`{|}~")=zimid16cz\:65536sm@Kdjz85

You can try it here

The different part here is converting the ipv6 to a number using base conversions:

=zimid16cz\:65536
         z          read string from input and assign it to z
        c \:        split it at columns
   m                map over the split list with lambda d:
    id16            convert d from base 16 to 10
  i         65536   convert list of ints from base 65536 to 10
=z                  assign result back to z

Tenchi2xh

Posted 2016-01-27T09:40:09.047

Reputation: 49

1

GCC C (functions) on x86, 240 bytes

This uses a couple of GCC specials:

#define U unsigned __int128
#define b(n) y[n&1]=__builtin_bswap64(y[n]);
g(U x){x?g(x/85),x%=85,putchar(x>9?x<36?x+55:x<62?x+61:"!#$%&()*+-;<=>?@^_`{|}~"[x-62]:x+48):0;}
f(char *a){long long y[3];inet_pton(10,a,&y[1]);b(2);b(1);g(*(U *)y);}

Explanation

  • f() is the entry point. It takes a single char *
  • inet_ntop() converts full or compressed string representation of input address to the elements 1 and 2 of a 64-bit integer array
  • Output from inet_ntop() is in network order (big endian), but the integer is needed in host order (little endian on x86) in order to perform arithmetic required for base conversion. Unfortunately gcc does not provide a __builtin_bswap128(). Instead the element 2 of the 64-bit integer array is bswap64()ed to the element 0 and the element 1 is bswap64()ed to itself. Elements 0 and 1 of the array now contain the address correctly in a host order 128-bit integer.
  • Standard successive div/mod by 85 is used to present the integer in base 85. Because the first div/mod yields the last base 85 digit (and so on), the function g() performs this conversion recursively. The digits are effectively pushed to the stack and then output in the correct order.
  • Conversion of digit value to ASCII character is a little messy. This is probably golfable.

Test driver

Mildly ungolfed for readability.

Compile with make ipv6enc or gcc ipv6enc.c -o ipv6enc:

#define U unsigned __int128
#define b(n) y[n&1]=__builtin_bswap64(y[n]);
g(U x){
  if(x){
    g(x/85);
    x%=85;
    putchar(x>9?x<36?x+55:x<62?x+61:"!#$%&()*+-;<=>?@^_`{|}~"[x-62]:x+48):0;
  }
}
f(char *a){
  long long y[3];
  inet_pton(10,a,&y[1]);
  b(2);
  b(1);
  g(*(U *)y);
}

int main (int argc, char **argv) {

    f("1080::8:800:200C:417A");

    return 0;
}

To do - Big endian version, 165 bytes

Find a big-endian platform where I can use the gcc 128-bit integers, so I can get rid of all the byte swapping stuff. The code should look like this, but right now this is untested on a big-endian platform:

#define U unsigned __int128
g(U x){x?g(x/85),x%=85,putchar(x>9?x<36?x+55:x<62?x+61:"!#$%&()*+-;<=>?@^_`{|}~"[x-62]:x+48):0;}
f(char *a){U y;inet_pton(10,a,&y);g(y);}

Digital Trauma

Posted 2016-01-27T09:40:09.047

Reputation: 64 644

1

For the big-endian version, my first attempt was to run raspbian under qemu. Unfortunately compilation yields error: ‘__int128’ is not supported for this target. Perhaps GCC will only do int128 on 64-bit targets? Perhaps this next?

– Digital Trauma – 2016-01-28T00:45:19.117