Combine multiple lines in a file to a single line

1

I have a file as below..

"Field1"|"Field2"|"Field3"|"ufghjkrtyrtyfgh$"  
"Field1"|"Field2  
continue on line 2  
continue on line 3"|"Field3"|"ufghjkrtyrtyfgh$"  
"Field1"|"Field2"|"Field3"|"ufghjkrtyrtyfgh$"

I am looking for output as below

"Field1"|"Field2"|"Field3"|"ufghjkrtyrtyfgh$"  
"Field1"|"Field2continue on line 2continue on line "|"Field3"|"ufghjkrtyrtyfgh$"  
"Field1"|"Field2"|"Field3"|"ufghjkrtyrtyfgh$"  
  1. Each records will end with $"
  2. Field 2 can be spread across multiple lines
  3. File is pipe delimited and double quote enclosed.

Could you please help me to resolve this problem?

Viswakanth

Posted 2016-05-24T02:24:46.317

Reputation: 11

Answers

3

$ awk '/[$]"[[:space:]]*$/{print;next} {printf "%s",$0}' file
"Field1"|"Field2"|"Field3"|"ufghjkrtyrtyfgh$"
"Field1"|"Field2continue on line 2continue on line 3"|"Field3"|"ufghjkrtyrtyfgh$"
"Field1"|"Field2"|"Field3"|"ufghjkrtyrtyfgh$"

How it works

  • /[$]"[[:space:]]*$/{print;next}

    For any line that ends with $ followed by ", optionally followed by white space, this (1) prints the line, and (2) skips the remaining commands and tells awk to start over on the next line.

    In awk regular expressions, $ signifies the end-of-the-line. If we want to match an actual dollar sign, we must escape it somehow. The most reliable way to escape it is to put it in square brackets: [$]. In the regex above, [$] is followed by the double-quote " and that is followed by [[:space:]]*. The character class [[:space:]] matches any white space characters and the * means we should match zero or more of them. This is followed by the unescaped $ which matches at the end of the line.

  • printf "%s",$0

    For any other line, this tells awk to print the line without a newline character.

John1024

Posted 2016-05-24T02:24:46.317

Reputation: 13 893

1

echo '"Field1a"|"Field2a"|"Field3a"|"ufghjkrtyrtyfgh$"
"Field1b"|"Field2b
continue on line 2                              
continue on line 3"|"Field3b"|"ufghjkrtyrtyfgh$"
"Field1c"|"Field2c"|"Field3c"|"ufghjkrtyrtyfgh$"' | sed -nr '/^".*"$/{p;n};:a;/[^"]$|^[^"]/{N;s/(.)\n(.)/\1\2/;ta};p'
"Field1a"|"Field2a"|"Field3a"|"ufghjkrtyrtyfgh$"
"Field1b"|"Field2bcontinue on line 2continue on line 3"|"Field3b"|"ufghjkrtyrtyfgh$"
"Field1c"|"Field2c"|"Field3c"|"ufghjkrtyrtyfgh$"

A sed solution. Every line starting and ending with " will be printed, then 'n' command read the next line and starts a new cycle. If a line doesn't start or end with ", it goes to the loop ':a .... ta', then 'N' command appends the next line, 's' command replaces «lastchar»«newline»«firstchar» (the '(.)\n(.)' part) with «lastchar»«firstchar», then 'ta' command jumps to the ':a' mark only if 's' command actually replaces something (this is the loop). If 's' doesn't replace anything, 'ta' doesn't jump to the mark and sed 'p'rints the resulting line and starts a new cycle with the next line. The awk solution really seems a lot cleaner. I think my sed solution can be improved.

Edit: -n option suppress automatic sed output so we print just what we want with 'p'. -r option is for advanced Regular Expressions.

Paulo

Posted 2016-05-24T02:24:46.317

Reputation: 606

1

A slightly different GNU awk solution:

awk -v RS='\\$" *' '{gsub(" *\n", ""); print $0 RT }' file

This uses a regular expression as record separator.

Michael Vehrs

Posted 2016-05-24T02:24:46.317

Reputation: 255