How to join multiples lines based on a pattern?

4

I want to join multiple lines in a file based on a pattern that both lines share.

This is my example:

{101}{}{Apples}
{102}{}{Eggs}
{103}{}{Beans}
{104}...
...

{1101}{}{This is a fruit.}
{1102}{}{These things are oval.}
{1103}{}{You have to roast them.}
{1104}...
...

I want to join the lines {101}{}{Apples} and {1101}{}{This is a fruit.}

to one line {101}{}{Apples}{1101}{}{This is a fruit.} for further processing.

Same goes for the other lines.

As you can see, both lines share the number 101, but I have no idea how to pull this off. Any Ideas?

/EDIT:

I found a "workaround":

First, delete all preceding "{1" characters from group two in VISUAL BLOCK mode with C-V (or similar shortcut), then sort all lines by number with :%sort n, then join every second line with :let @q = "Jj" followed by 500@q.

This works, but leaves me with {101}{}{Apples} 101}{}{This is a fruit.}. I would then need to add the missing characters "{1" in each line, not quite what I want. Any help appreciated.

ryz

Posted 2011-01-30T21:11:18.810

Reputation: 595

What is the maximum number in each group? Are there any lines that would remain unmatched in either group? – Paused until further notice. – 2011-01-30T21:37:42.217

The maximum number in each group is {219} / {1219}. No, all files would match. – ryz – 2011-01-30T21:46:45.743

Answers

3

Instead of deleting the {1, just do

:%sort rn /\d\d\d}/

That will do a numerical sort, but on each line it will only look at three digits followed by a }.

Also, to join the lines afterwards, I would do

:g/{\d\d\d}/j!

Jan Hlavacek

Posted 2011-01-30T21:11:18.810

Reputation: 1 175

@ryz: See! I told you! @Jan: +1! – Paused until further notice. – 2011-01-31T01:30:45.610

2

Here's a way to do it in the shell with a file:

join -j 2 \
    <(sed -n '/^{...}/{s/{/{ /;s/}/ }/;p}' inputfile) \
    <(sed -n '/^{....}/{s/{./& /;s/}/ }/;p}' inputfile) |
    sed 's/^\([^ ]*\) { }{}\({[^}]*}\) {1 }\({.*}\)$/{\1}{}\2{1\1}\3/'

It uses the first two invocations of sed to split the file based on the number of digits between the first set of curly braces and adds spaces around the last three digits ({101} becomes { 101 } and {1101} becomes {1 101 }). Then it uses those three-digit numbers as a field for the join command key on. The last sed command puts the digits back where they belong and removes the extra spaces added earlier.

A vim guru could probably do something better within vim. I could do something more straightforward than the above using AWK.

Paused until further notice.

Posted 2011-01-30T21:11:18.810

Reputation: 86 075

0

Here is example using Vim/Ex editor from the command-line for one pattern:

$ ex +'redir @a|sil g/101}/' +'redi>>/dev/stdout|echon join(split(@a),"")' -scq! input.txt 
{101}{}{Apples}{1101}{}{This is a fruit.}

For multiple patterns, either repeat with extra commands, add a loop, or loop it from shell, e.g.

$ for i in `seq 1 3`; do ex +"redir @a|sil g/10$i}/" +'redi>>/dev/stdout|echo join(split(@a),"")' -scq! input.txt; done
{101}{}{Apples}{1101}{}{Thisisafruit.}
{102}{}{Eggs}{1102}{}{Thesethingsareoval.}
{103}{}{Beans}{1103}{}{Youhavetoroastthem.}

Using just shell to parse the data, it's much simpler, e.g.:

$ grep "101}" input.txt | xargs
{101}{}{Apples} {1101}{}{This is a fruit.}

For multiple lines:

$ for i in `seq 1 4`; do grep "10$i}" input.txt | xargs; done
{101}{}{Apples} {1101}{}{This is a fruit.}
{102}{}{Eggs} {1102}{}{These things are oval.}
{103}{}{Beans} {1103}{}{You have to roast them.}

kenorb

Posted 2011-01-30T21:11:18.810

Reputation: 16 795