Reordering file lines like other file (Unix)


Is there a tool (or option for sort) which will re-order lines of a file so that they are ordered like a key in another file?

For example, I have a data file:

T01F01475558    30
T01F022B3A17    31
T01F022EEDFD    19
T01F026E0209    19

And another (sort "key" file):


Is there a way to sort the first file so that the first field is in the same order as the 2nd file? Each key is unique (no duplicates), and there are an equal number of lines in each file.

Is there a UNIX tool I don't know about that will do this?

Taj Morton

Posted 2013-08-27T18:40:08.850

Reputation: 121



Each key is unique (no duplicates), and there are an equal number of lines in each file.

This assumption is very important. If it holds then this command will do the job (in Bash):

paste <(nl key.file | sort -k 2 | cut -f 1) <(sort data.file) | sort -n | cut -f 2-

Few tools use tab characters as separators. For this reason tabs mustn't occur in key.file (they may occur in data.file though). Sane entries in key.file should form a single column anyway, so it shouldn't be a problem.


  1. nl adds a line number in front of every line of key.file; this makes the keys themselves move to the second column; sort -k 2 sorts according to the second column, i.e. to the keys. The keys are then discarded by cut -f 1.
  2. Another sort sorts data.file. Since the keys in front are unique, this default sorting is equivalent to sorting according to the sole keys.
  3. The two results from sort-s are merged by paste. Without the first cut an example line would be:

         4  T01F01475558    T01F01475558    30

    The uniqueness of keys and equal number of them in both files are crucial. In effect the same keys from both sort-s meet in the same line leaving paste. Since you don't need duplicated keys to occupy the memory, the first cut was used as soon as possible. With it the real example line leaving paste is rather:

         4  T01F01475558    30
  4. These lines are then sorted according to their numerical value. Line numbers from nl are in front, so this operation introduces the desired order.

  5. At the end cut discards the first column, leaving the exact lines from data.file, yet in the desired order.

Alternatively you can try this (tested in Bash):

while IFS='' read -r ; do
   [ -n "$REPLY" ] && grep "^$REPLY " data.file
done <key.file

Note the code expects a space character after each key in data.file.


  • key.file may specify any number of keys, duplicate keys, nonexistent keys. In this case don't think "sorting", think "retrieving desired lines one by one".
  • You can stream input (like stdin instead of key.file, just omit <key.file) and get the output on the fly.


  • grep will interpret keys as regular expressions, this may backfire. There is grep -F but in general you need ^ in the pattern.
  • read is slow; spawning grep again and again is slow; opening data.file again and again is slow.

Kamil Maciorowski

Posted 2013-08-27T18:40:08.850

Reputation: 38 429