grep file for only part of line

2

I have a rtf file that I'm using grep on like this

 grep "Order Number" 'Extract Text Output.rtf'

which results in lines that look like this

\b\fs28 \cf2 Fab Order Number : FAB00772450\

and I want the result to be just FAB00772450

I know if I use -o it will just return the word "Order Number" but that doesn't help me

mcgrailm

Posted 2012-03-23T02:28:56.380

Reputation: 312

Is not the same as this http://stackoverflow.com/q/974757/422353?

– None – 2012-03-23T02:35:54.090

how the hell is this question off topic ? someone please explain – mcgrailm – 2012-03-23T02:37:43.807

Try piping that to awk, then you can split it up and do whatever you like to it. – user1200129 – 2012-03-23T02:40:30.030

Answers

3

cat 'Extract Text Output.rtf' | sed -n 's/Order Number : \(.*\)\\/\1/gp'

Yields exactly what you want.

Explanation:

  • sed -n suppress default output of sed
  • s/.../.../g search and replace, g: everything/globally
  • Order Number : \(.*\)\\ look for "Order Number : " string and a backslash and save anything in between to group 1; (downside of using sed is to have to escape regex's grouping operator: (...) with \(...\) )
  • \1 use group 1 as replacement
  • p print replacement if any match

This is way more flexible and generic than using hard-coded awk groups ($7).

Note 1: use .*? if you have lines formatted like this:

 \cf2 Fab Order Number : FAB00772450\ \b \cf2

This prevents regex from being greedy and stops at the first backslash. Not tested if sed supports *? and +? operators, but let's hope.

Note 2: If you have multiple parts you want to extract from a line, use multiple groups and in the replacement string you can even switch them with formatting, like .../\2 - \1/

TWiStErRob

Posted 2012-03-23T02:28:56.380

Reputation: 173

2

This works for me:

grep "Order Number" test.txt | awk {'print $7'} | tr "\\\ " " "

output:

FAB00772450

user1200129

Posted 2012-03-23T02:28:56.380

Reputation: 153

what does the 7 do ? – mcgrailm – 2012-03-23T02:55:05.777

it prints the 7th column I think. It splits on whitespace. – user1200129 – 2012-03-23T03:04:09.807

1It prints the 7th field. The split is on whatever FS is (defaults to space). – Scott C Wilson – 2012-03-24T20:16:24.867

0

If this format is always followed but the number of tokens is not always the same, you could pipe it through something like

sed 's/.*: //' | sed 's#\##'

This also yields "FAB00772450"

Scott C Wilson

Posted 2012-03-23T02:28:56.380

Reputation: 2 210