Merging two files in unix, with one common column that is redundant

Answers

join file1 file2

Output:

chr1:66997824-67000456 ZNF333 Refseq#1
chr1:66997824-67000456 EGR1 Refseq#1
chr1:66997824-67000456 MZF-1 Refseq#1
chr22:51221989-51222166 Zic2 Refseq#22
chr22:51221989-51222166 ZF5 Refseq#22

join file1 file2 | awk '{OFS="     ";print $1,$2,$3}'

Output:

chr1:66997824-67000456     ZNF333     Refseq#1
chr1:66997824-67000456     EGR1     Refseq#1
chr1:66997824-67000456     MZF-1     Refseq#1
chr22:51221989-51222166     Zic2     Refseq#22
chr22:51221989-51222166     ZF5     Refseq#22

Cyrus

Posted 2015-10-31T14:41:06.023

Reputation: 4 356

It didn't work, it exited after printing the first refseq#. My first file has 2000+records with chromosomal locations chr1:66997824-67000456 - the join command printed Refseq#1 for those 2000+ records and then exited. My input file has almost 20000+ lines whereas the output has only 2000+ lines. – ABB – 2015-11-02T15:27:37.887

@ABB: It should work though, only IF THE FILES ARE SORTED though. Use sort if they are not. Else, are you sure there is no mismatch between the first field of the two files? To test, try first with a subset, e.g using uniq -w 24 on file 1 and try to track your bug. – Joce – 2015-11-03T15:49:39.957

@ABB: Try this: join <(sort file1) <(sort file2) – Cyrus – 2015-11-05T06:38:32.217