IPV4 subnet list cleaner (in CIDR notation)

7

We have a lot of IPV4 address, to group into subnets, so we need a tool for doing this:

Our list is very ugly, not ordered and sometime in subnet notation, sometime in simple address.

Sample:

192.168.5.16/30
192.168.5.18
192.168.2.231/27
192.173.17.0/24
192.168.1.5/32
192.168.5.18/32
192.168.3.0/24
192.168.2.231/27
192.168.2.245/29
192.168.4.0/29
192.173.16.0/24
192.173.18.0/24
192.173.19.0/24
10.100.12.24

some explanation about the input (our old list):

  • each line hold only 1 IPV4 subnet
  • if a line don't contain any subnet info like on line 2, it's a single ip, so the 32 bit mask is to be considered.
  • subnet info is in the CIDR form: IPV4/integer where integer is mask length
  • IPV4 is not necessarely the base (first in range) of subnet
  • list is not sorted
  • list is not short... could make more the 60k entries...
  • there is repetition, overlap of subnets... lot of useless lines

( The last is the main reason we need to build this tool! ;-)

From this sample list we want to obtain shortened, sorted and cleaned list like:

10.100.12.24/32
192.168.1.5/32
192.168.2.224/27
192.168.3.0/24
192.168.4.0/29
192.168.5.16/30
192.173.16.0/22

Where all lines is subnet and all IPV4 part is base IP (begin of range), sorted, and without overlap or repetitions.

And where all IP present somewhere in input is present and no one more!

CIDR

For informations about:

Code golf

  • Input values could be given as you will, (arguments, read from file or STDIN)
  • Output to STDOUT, one subnet per line.
  • Output has to be sorted by the unsigned integer value of IPV4 (big-endian)
  • Output has to contain as few lines as possible
  • Of course, sum of output subnet must match sum of input subnet! no more, nor less! So 0.0.0.0/0 which override everything is not accepted! (unless it's the only valid answer, see last test)
  • Use of libraries forbiden! Standard loophole applies
  • Shortest code wins!

Test cases

For testing this, (under bash) I use something like:

for i in {167772416..167774464};do
     echo $[i>>24].$[i>>16&255].$[i>>8&255].$[i&255]
  done

as input, this must render:

10.0.1.0/24
10.0.2.0/23
10.0.4.0/22
10.0.8.0/24
10.0.9.0/32

or

for i in {167772416..167774464};do
     [ $i -gt 167773316 ] && [ $i -lt 167773556 ] ||
         echo $[i>>24].$[i>>16&255].$[i>>8&255].$[i&255]/30
  done

Note, I've added /30 after each ip. this will do a lot of overlap!

10.0.1.0/24
10.0.2.0/23
10.0.4.0/25
10.0.4.128/29
10.0.5.116/30
10.0.5.120/29
10.0.5.128/25
10.0.6.0/23
10.0.8.0/24
10.0.9.0/30

Extreme test rule

63.168.3.85/2
64.168.3.85/2
192.168.3.85/2
191.168.3.85/2

This special case must render:

0.0.0.0/0

Accepted answers

As this question seem not so easy, I will add time from question to the list

  1. CJam, 101 - user23013 - 38 hours
  2. Perl, 223 - nutki - 1 week
  3. Pure bash, 1040 - F.Hauri - 4 days

F. Hauri

Posted 2015-03-25T15:29:22.357

Reputation: 2 654

1What is the sort order? – Peter Taylor – 2015-03-25T15:47:09.990

@PeterTaylor It seems to be ordered byte-by-byte, in big-endian format. The subnets number can be ignored since if two subnets have the same IPv4 part, the larger contains the smaller, which can be ignored. – PurkkaKoodari – 2015-03-25T16:01:58.797

1I'm confused about what the number after the / is doing, because at first sight it seems it can be safely ignored in the example. Is it possible for the first four numbers to be the same and the one after the /? Also only one out of 192.168.<16 through 19>.0/24 is shown. I assume that's because it's been masked off to 22 bits and simplified to 192.168.16.0/22 (there's no real explanation in the question about this.) But by that logic I can mask everything and 0.0.0.0/0 should be a complete and valid output for all cases. – Level River St – 2015-03-25T18:41:31.160

1@steveverrill it's mask length: 32bits => 255.255.255.255, or 24bits => 255.255.255.0, 30bits => 255.255.255.252. And you're right: 192.168.1[6-9].0/24 => 192.168.16.0/22 – F. Hauri – 2015-03-25T20:19:29.227

@steveverrill You're right I missed this rule: no more!: 0.0.0.0/0 could not be accepted! – F. Hauri – 2015-03-25T20:25:47.860

No answer? Not so easy? – F. Hauri – 2015-03-25T22:01:57.857

I don't really understand your explanation of course, sum of output subnet must match sum of input subnet! no more, nor less! Maybe it's my lack of knowledge of IPV4, but is it valid (for example) to condense all the addresses starting 192.168 down to 192.168.0.0/16 ? And if not, why not? It's not clear to me from the rules why this hasn't been done in your example, yet it has been done for the addresses that you condensed into 192.173.16.0/22. Is there a minimum value for the mask length or something? – Level River St – 2015-03-26T00:26:03.227

@steveverrill for sample: if 192.168.1.4 is not covered by the input list, theys must not match the output list! No more! For all IP betweeen 0.0.0.0 to 255.255.255.255 ( -> 4294967296 possibilities) no more must be found in ouptut than in input – F. Hauri – 2015-03-26T06:30:57.413

1Ok i get it now. 192.173.16.0/22 expands exactly to 192.173.[16-19].0/24 and as the output must be as brief as possible this is not only acceptable but also required. – Level River St – 2015-03-26T14:09:11.990

When you say "Use of libraries forbidden", does that forbid core functions included in platform that may operate on IP addresses or external libraries? – 640KB – 2019-05-01T19:24:48.820

Hem yes, I think, but if you post, comments from community will give better answer... – F. Hauri – 2019-05-01T20:35:10.683

Answers

3

CJam, 101 95 94

qN/{'//('./1\+256b2b\{i)<}/}%_|{{):T;_aM+:M(a&}gT+}%${_2$#!{;}*}*]{_(!a32*+32<8/2fb'.*'/@,(N}%

The {~}2j construct didn't make it shorter...

Try it here. (And link for Firefox.)

Explanation

qN/               " Read each line. ";
{                 " For each line: ";
    '//           " Split by / ";
    ('./          " Split first item by . ";
    1\+256b2b     " The address + 2^32 in base 2, which always has length 33 ";
    \{i)<}/       " For each other items N (if any), get the first N+1 bits. ";
}%
_|                " Remove duplicates. ";
{                 " For each item (each array of bits): ";
    {             " Do: ";
        ):T;      " Remove last bit and save it in T. ";
        _aM+:M    " Add the array to M. ";
        (a#)      " Find the array in the original M. ";
    }g            " while it is found. ";
    T+            " Append the last T back. ";
}%
$                 " Sort. ";
{                 " Reduce and wrap in array: ";
    _2$#!         " Test if the previous string is a prefix of the current. ";
     {;}*         " If so, discard it. ";
}*]
{                 " For each item (each array of bits): ";
    _(!a32*+32<   " Remove first bit and pad to 32 bits to the right. ";
    8/2fb'.*      " Convert to IP address. ";
    '/@,(         " Append / and original length - 1. ";
    N             " Append a newline. ";
}%

jimmy23013

Posted 2015-03-25T15:29:22.357

Reputation: 34 042

Bravo, this seem work fine! (I've not asready test all special cases) – F. Hauri – 2015-03-27T06:59:56.927

@F.Hauri It doesn't work with addresses like 0177.0x1, of course. – jimmy23013 – 2015-03-27T07:07:23.513

:->> This is not a test case! CIDR notation are four integer between 0 and 255, dot separated, terminated by a slash and another integer between 0 and 32. – F. Hauri – 2015-03-27T07:10:07.640

test case: for i in {3232236343..3232237343};do echo $[i>>24].$[i>>16&255].$[i>>8&255].$[i&255];done – F. Hauri – 2015-03-27T07:27:56.757

Give: 192.168.3.55/32 192.168.3.56/29 192.168.3.64/26 192.168.3.128/25 192.168.4.0/23 192.168.6.0/24 192.168.7.0/27 That's correct!! – F. Hauri – 2015-03-27T07:28:40.843

3

Perl, 172 223 245

Using hashes was not necessary. The removal allowed for a significantly shorter code, though the idea is the same.

#!perl -lp0a
$_=join$\,sort map{1x(s/\d*./unpack B8,chr$&/ge>4?$&:32)&$_}@F;1while s/^(.*)
\1.*/$1/m||s/^(.*)0
\1.$/$1/m;s!^.*!(join'.',map{ord}split'',pack B32,$&).'/'.length$&!gme

Test me.

Older version with explanation:

#!perl -lp0a
s/(^|\.)(\d+)/sprintf"%08b",$2/ge,/\D/&&($_=$`|2x$',y/0-3/##01/)for@F;$_=join$\,sort@F;1while s/^((\d*)#*)
\2.*/$1/mg||s/^(\d*)0(#*)
\1[1]\b#*/$1#$2/mg;s!.+!$&.'/'.$&=~y/01//!ge;y/#/0/;s!\d{8}\B!$&.!g;s!\d{8}!"0b$&"!gee

Test me.

First convert input to textual format (s/(^|\.)(\d+)/sprintf"%08b",$2/ge,/\D/&&($_=$p|2x$',y/0-3/##01/)):

00##############################
01##############################
11##############################
10##############################

Sort it ($_=join$\,sort@F):

00##############################
01##############################
10##############################
11##############################

Clean using two regexps: one to remove duplicates and more specific addresses and the second to join two adjacent networks into one with a longer mask (s/^((\d*)#*)\n\2.*/$1/mg and s/^(\d*)0(#*)\n\1[1]\b#*/$1#$2/mg).

################################

And convert back to the original form (s!.+!$&.'/'.$&=~y/01//!ge;y/#/0/;s!\d{8}\B!$&.!g;s!\d{8}!"0b$&"!gee):

0.0.0.0/0

nutki

Posted 2015-03-25T15:29:22.357

Reputation: 3 634

Yes! This work! I can't run this as a script. I have to run perl -M5.0.0 script.pl <ip-list.txt but this work! – F. Hauri – 2015-04-02T18:45:22.203

1

Ruby - 425

(broken - see comment below)

a=(0..32).map{[]}
l=->m{2**m-1<<32-m}
$<.map{|i|/\//=~i=i[?/]?i:i.chop+'/32'
s=$'.to_i
n=$`.split(?.).inject(0){|v,d|v*256+d.to_i}
next if s.times.any?{|t|a[t].any?{|e|n&l[t]==e[1]}}
s.upto(32){|t|a[t].reject!{|e|e[1]&l[s]==n}}
a[s]<<[i,n]
(b=k=n&l[s]
break if(y=a[s+1].select{|e|e[1]&l[s]==b}).size<2
a[s+1]-=y
a[s]<<[(0..3).map{k,i=k.divmod(256);i}.reverse*?.+?/+s.to_s,b])until(s-=1)<0}
puts a.flatten(1).sort_by{|v|v.pop}

Not that Charles

Posted 2015-03-25T15:29:22.357

Reputation: 1 905

Hmm, This seem bug at 1st test: ouput 192.168.2.192/26 instead of 192.168.2.224/27 – F. Hauri – 2015-04-01T18:33:07.620

@F.Hauri Hrm. I'll look into it. Not committed to fixing it yet though – Not that Charles – 2015-04-01T21:38:17.920

0

Pure bash 1100 1079 1040

Edited: A little better more golfed!

#!/bin/bash -i
alias d=done e=unset f=printf\ -v g=local h=else i=then j=for k=while C=do E=fi
n(){ g s e i;j i in ${!u[@]};C [ $1 -ge $[i+1] ]&&[ $1 -le $[u[i]+1] ]&&s=$i
[ $2 -ge $[i-1] ]&&[ $2 -le $[u[i]-1] ]&&e=$i;d;if [ "$s" ];i if [ "$e" ];i \
[ $s -ne $e ]&&{ u[s]=${u[e]};e u[e];};h [ $1 -lt $s ]&&{ u[$1]=${u[s]};e u[s
];s=$1;};[ $2 -gt ${u[s]} ]&&u[s]=$2;E;h if [ "$e" ];i [ $1 -lt $e ]&&{ u[$1
]=${u[e]};e u[e];e=$1;};[ $2 -gt ${u[e]} ]&&u[e]=$2;h u[$1]=$2;E;E;};q(){ 
g s=${1#*/} m=${1%/*} z;[ -z "${s//*.*}" ]&&s=32;y $s z;set -- ${m//./ }
m=$[$1<<24|$2<<16|$3<<8|$4];n $[m&z] $[m|((2**32-1)^z)];};y(){ f $2 "%u" $[ (
2**32-1)^(~0^~0<<(32-$1))];};r(){ f $2 "%s" $[$1>>24].$[$1>>16&255].$[$1>>8&
255].$[$1&255];};t(){ g s z;k [ "$1" ];C s=32;k y $[s-1] z&&[ $1 = $[($1>>(33
-s))<<(33-s)] ]&&[ $2 -ge $[$1|((2**32-1)^z)] ];C ((s--));d;r $1 x;echo $x/$s
y $s z;if [ $2 -gt $[$1|((2**32-1)^z)] ];i set -- $[1+($1|((2**32-1)^z))] $2
h shift 2;E;d;};k read w;C q $w;d;p=-1;j o in ${!u[@]};C [ $o -gt $p ]&&p=${u[
o]}&&t $o ${u[o]};d

which work quickly, even on wide ranges!

./myscript.sh <<<"63.168.3.85/2\n64.168.3.85/2\n192.168.3.85/2\n191.168.3.85/2"
0.0.0.0/0

./myscript.sh < <(for i in {167772416..167774464};do
     [ $i -gt 167773316 ] && [ $i -lt 167773556 ] ||
         echo $[i>>24].$[i>>16&255].$[i>>8&255].$[i&255]/30
   done)
10.0.1.0/24
10.0.2.0/23
10.0.4.0/25
10.0.4.128/29
10.0.5.116/30
10.0.5.120/29
10.0.5.128/25
10.0.6.0/23
10.0.8.0/24
10.0.9.0/30

F. Hauri

Posted 2015-03-25T15:29:22.357

Reputation: 2 654

0

Javascript ~1335

While total js is 2009, the essential part is contained in two function and main routine. html, css and ~674 chars of javascript in only cosmetic.

var g={};function B(){document.getElementById('z').addEventListener('click',go);
window.addEventListener('resize',C);C();}function C(){document.getElementById('y'
).setAttribute('style','height:'+(window.innerHeight-46-document.getElementById(
'y').offsetTop).toFixed(0)+"px");document.getElementById('x').setAttribute(
'style','height:'+(window.innerHeight-46-document.getElementById('x').offsetTop
).toFixed(0)+"px");}function go(){document.getElementById('y').innerHTML='';
document.getElementById('A').innerHTML='...running...';w()
}function u(q,j){var o=Object.keys(g).sort(function(a,b){return a-b}),s,e;for(var
i=0;i<o.length;i++){if((q*1>=o[i]*1+1)&(q<=g[o[i]]*1+1))s=1*o[i];if((j*1>=o[i]-1
)&(j<=g[o[i]]-1))e=1*o[i]};if(typeof(s)!=='undefined'){if(typeof(e)!=='undefined'
){if(1*s<1*e){g[s]=g[e];delete g[e]}}else{if(1*q<1*s){g[q]=g[s];delete g[s];s=q};
if(1*j>1*g[s])g[s]=j}}else{if(typeof(e)!=='undefined'){if(1*q<1*e){g[q]=g[e];
delete g[e];e=q};if(1*j>1*g[e])g[e]=j}else{g[q]=j}}}function v(f,p){var e,ie,c=''
;while(typeof(f)!=='undefined'){e=32;while((ie=(~0<<(33-e))>>>0)&&(f==((f>>(33-e)
)<<(33-e))>>>0)&&(p>=(((~0<<32)^ie)|f)>>>0)&&(e>0))e--;c+=(f>>24&255)+"."+(f>>16&
255)+"."+(f>>8&255)+"."+(f&255);c+='/'+e+'<br/>';ie=e>0?(~0<<(32-e))>>>0:0;if(p>(
(f|(~0<<32)^ie))>>>0){var d=f;f=1*1+((f|((~0<<32)^ie))>>>0)}else{f=undefined}}
return c}function w(){var l=[];g={};document.getElementById('A').innerHTML+=
'<br/>reading...';var n=document.getElementById('x').value.split("\n");for(var
i=0;i<n.length;i++){if(!n[i].match("/"))n[i]+="/32";var s=n[i].match(
/(\d+)\.(\d+)\.(\d+)\.(\d+)\/(\d+)/);if((s)&&(s.length==6)){var k=(s[1]<<24|s[2]
<<16|s[3]<<8|s[4])>>>0;var e=(~0<<(32-s[5]))>>>0;var t=(~0&k&e)>>>0;var r=(((~0
<<32)^e)|t)>>>0;u(t,r);}};var o=Object.keys(g).sort(function(a,b){return a-b});
var h=-1;for(i=0;i<o.length;i++){if(1*o[i]>1*h){h=g[o[i]];document.getElementById
('y').innerHTML+=v(o[i],g[o[i]]);}}document.getElementById('A').innerHTML+=
'<br/>printing...'};window.onload=B;
body{font-family:Sans;background:#9ac;}#z{position:absolute;top:3em;right:2%;}
span button, span input{font-size:.63em;}#A{width:50%;min-height:3.4em;background:
#679;border:1px solid black;border-color:white black black white;border-radius:
.5em;padding:.2em 1em .2em 1em;margin:1em 0px .1em 0px;margin-left:24%;color:
#cdf;box-shadow: 2px 2px 4px black;}#y{width:72%;overflow:auto;margin-left:24%;
background:#bce;padding:1em;border-radius:.8em;border:1px solid black;box-shadow:
2px 2px 4px;border-color:white black black white;}#x{position:fixed;top:1.5em;
left:1%;padding:1em;width:18%;anchor:left;height:33em;border-radius:.3em;border:
1px solid black;border-color:white black black white;box-shadow:2px 2px 4px;}
<button id="z"> GO </button><textarea id="x"></textarea><div id="A">---</div><pre id="y"></pre>

F. Hauri

Posted 2015-03-25T15:29:22.357

Reputation: 2 654