Find duplicate column and separate them to a file or variable(Bash)

1

I have output like ; (The columns separated by tab \t)

name1   something1
name1   something2
name1   something3
name2   something4
name2   something5

For this output I need two output (if there is name3, I will need 3 output) like

name1   something1
name1   something2
name1   something3

and

name2   something4
name2   something5

I think this will be done by AWK but I couldn't create magic words.

What is the best way to do this?

I need a condition to read $1 "1.column" and print all of them(not delete duplicates) unless it will change and print other columns ($2,$3,...)

I think using loop it prints first output and so on.

makgun

Posted 2015-08-11T21:32:58.240

Reputation: 305

Answers

2

Try this:

awk -F'\t' '{print>$1;}' file

When the above command is complete, there will be two more files in the directory:

$ cat name1
name1   something1
name1   something2
name1   something3
$ cat name2
name2   something4
name2   something5

How it works

  • -F'\t'

    This tells awk to use a tab as the field separator.

  • print>$1

    This tells awk to print each line to the a file named after the first field.

Removing illegal characters from file names

Suppose the input file looks like:

$ cat file
name/1  something1
name/1  something2
name/1  something3
name/2  something4
name/2  something5

The following code creates files based on the name field but with the / removed:

awk -F'\t' '{name=$1; gsub(/[/]/, "", name); print>name;}' file

The above was tested on GNU awk and ran successfully. If your awk does not accept , then try:

awk -F'\t' '{name=$1; gsub("/", "", name); print>name;}' file

or:

awk -F'\t' '{name=$1; gsub(/\//, "", name); print>name;}' file

John1024

Posted 2015-08-11T21:32:58.240

Reputation: 13 893

1lol thats elegant! – theoden8 – 2015-08-11T21:48:49.063

AWK cannt open "name1" for output? It wont create a file? – makgun – 2015-08-11T21:53:31.963

@makgun That likely means that the command is being run in a directory for which you do not have write permission. Before running the command, cd to a directory that you own. – John1024 – 2015-08-11T21:55:11.510

I am at $HOME in my bash-shell – makgun – 2015-08-11T21:57:46.293

The problem cause for meta charecters which doesnt allowed by system to be named a file like : / – makgun – 2015-08-11T22:00:37.193

Is there any way to be named it with incrementing filename instead of "$1" – makgun – 2015-08-11T22:05:22.647

@makgun See updated answer for a way to remove illegal characters from file names. – John1024 – 2015-08-11T22:05:28.533

my output;

`makgun@makgun02:~$ awk -F'\t' '{name=$1; gsub(/[/]/, "", name); print>name;}' random

awk: line 1: regular expression compile failed (bad class -- [], [^] or [)

[

awk: line 1: syntax error at or near ] ` – makgun – 2015-08-11T22:09:28.607

Curious. What OS are you using? I updated the answer with two alternative ways to write that regex. Give them a try. P.S. Thanks for supplying the complete error message. – John1024 – 2015-08-11T22:20:09.647

I used this command

awk -F'\t' '{name=$1; gsub(/[\/]/, "", name); print>name;}' file with escaping / with using \\ after this it creates files +1 – makgun – 2015-08-11T22:29:41.170

@John1024 Long time later, I am again here, I need a command to do this to an array variable. I used for loop to put them into array but if there is a file which is named as name output in current dir , it overwrites . So How can I DO assign them to an array? It must be assign in awk and also it must be recallable outside of awk (in current shell). – makgun – 2015-09-04T22:02:19.783

@makgun To append, rather than overwrite, replace > with >> : awk -F'\t' '{print>>$1;}' file . – John1024 – 2015-09-09T18:52:33.907

Thanks @John1024. Appending it not useful for my script but I solved this issue via using awk and for loop echo "$var" | awk '{print $1}' | awk '!a[$0]++' --->>> this gives me all $1 as string and I assigned all of them as array. After that I created forloop to re call array and then assigned them a new array (i used newarray[$i]=$(awk -v a=${a[$i]} '{if ($1 == a) print $0}') this condition and for loop.)($i is in ${!a[*]}) – makgun – 2015-09-09T19:04:08.023

0

I think this should work:

mkdir tmp; cd tmp
while IFS= read line; do
    echo "$line" >> $(echo "$line" | awk '{print $1}')
done
cat *

This reads input line by line and appends each line accordingly to it's first argument.

If you want to stream it to variable:

while IFS= read line; do
    key="$(echo "$line" | awk '{print $1}')"
    eval "INPUT_$key='\$INPUT_$key\$line'"
done

If you have big demands from it, use:

#!/usr/bin/python

import sys
import re

for line in sys.stdin:
    f = open(re.split("\s+", line, 1), 'a')
    f.write(line)
    f.close()

This will work. Must. It can't fail.

theoden8

Posted 2015-08-11T21:32:58.240

Reputation: 644

With this , it prints just $1 and it wont find last line if it changes – makgun – 2015-08-11T21:43:03.127

@makgun, it will, if you press enter. – theoden8 – 2015-08-11T21:49:16.027

I created bash script and I added this to file with adding < <(cat $file) after done but it didnt work – makgun – 2015-08-11T22:03:02.660

@makgun, if you are planning to use all kinds of characters on all platforms, don't use bash/awk/gawk/etc, use perl/python. – theoden8 – 2015-08-11T22:33:28.767

I don't know how phyton works and I need to change all previous command to get this my first output – makgun – 2015-08-11T22:40:33.590