Generate CSV from NS Zone File

1

I have a large file that contains records in multilines. ex:

domain1 NS ns1
domain1 NS ns2
domain1 NS ns3
domain2 NS dnsx

What might be the fastest way to generate a CSV of the form

domain1,ns1,ns2,ns3
domain2, dnsx

I have tried php and groovy scripts but the processor load time is too high for a file (read) of a 1Gb file (and a consequent write to a csv file).

I am thinking there should be a better programatic approach than what I am doing. Basically I am creating a list/array to contain records and check against the last element if the same domain is found in the current line being checked.

P.S Mentioned groovy/php, but response may have no relation to these specific scripting languages.

Armand

Posted 2015-10-07T20:24:06.087

Reputation: 404

Is the file already sorted by domain? Or at least, are all the records with the same domain grouped together? – glenn jackman – 2015-10-07T20:56:54.517

They are sorted. – Armand – 2015-10-08T07:28:22.240

Answers

1

Assuming all the records with the same domain are grouped together, this awk program will have a very small memory footprint (can't say what the CPU load will be)

awk '
    $1 != domain {
        if (domain) print ""
        printf "%s", $1
        domain = $1
    } 
    {printf ",%s", $3} 
    END {print ""}
' file

glenn jackman

Posted 2015-10-07T20:24:06.087

Reputation: 18 546

Although CPU load hits 100, this is way much more effective than what I had! Thank you very much. Neet to tweak the script to read file arguments and is perfect!

P.S Seems the %s is not needed on line 7. – Armand – 2015-10-08T08:32:48.057

I always use a formatting string with printf. What if the $1 string itself contains a %s – glenn jackman – 2015-10-08T10:41:44.563

It is just that it ends up in a fatal error: (FILENAME=net.zone2 FNR=1) fatal: not enough arguments to satisfy format string `,%sNS1.' ^ ran out for this one – Armand – 2015-10-08T11:18:56.630

1Oh! I forgot the comma after the format string. – glenn jackman – 2015-10-08T12:27:39.880