Using grep, sed or awk to find words in between? I want to be able to extract the package name until .el7

1

I am writing a script (bash/command-line) and I want to be able to extract the package name until .el7

x=dbus-sharp (an example package name - which changes)

example text file:

Building dbus-sharp-0.7.0-11.fc22 for epel7
Created task: 7970206
...
0 free  1 open  1 done  0 failed
  7970225 buildArch (dbus-sharp-0.7.0-11.el7.src.rpm, ppc64): free
  7970223 buildArch (dbus-sharp-0.7.0-11.el7.src.rpm, x86_64): open (buildhw-03.phx2.fedoraproject.org)
...

basically now I want

y=dbus-sharp-0.7.0-11.el7

It doesn't matter if I need to use grep, sed or awk.

I haven't had any luck googling for a similar solution.

Examples I have tried:

[me@h dbus-sharp]$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a

[me@h dbus-sharp]$ cat scratchdbus-sharp | grep -Po '(?<=(dbus)).*(?= el7)'
(no output?)

[me@h dbus-sharp]$ cat scratchdbus-sharp | awk '/dbus/,/el7/'
(it dumps the whole text file?)

[me@h dbus-sharp]$ sed -n "/dbus/,/el7/p" scratchdbus-sharp
(again the whole text file is dumped)

[me@h dbus-sharp]$ grep -m 1 "dbus-sharp" scratchdbus-sharp 
Building dbus-sharp-0.7.0-11.fc22 for epel7

Guess I should also note that epel7 will be in the text file(s) which will also cause a match for 'el7' complicating things.

quickbooks

Posted 2014-10-31T07:25:41.130

Reputation: 63

Answers

0

A grep solution:

grep -m 1 -oP 'dbus[^ ]+\.el7' file

-m 1 prints only one match, -o only the matching part and -P uses Perl regex.

And a sed solution:

sed -n 's/.*\(dbus.*\.el7\).*/\1/p' file | head -1

Removes all before and after dbus.*el7 and prints it (p), but only the first match (head -1).

chaos

Posted 2014-10-31T07:25:41.130

Reputation: 3 704

I got it to work like this: grep -m 1 -oP $(echo $x)'[^ ]+\.el7' scratchgio-sharp Can you please explain how you figured out to use this [^ ]+\ ? Or what [^ ]+\ does? Thanks. – quickbooks – 2014-10-31T08:38:10.380

1@quickbooks sure, [^ ]+ means that there are characters between, that are not ^ a space \. The + means there are at least one or more of them. I use that because the epel7 line is also found. But with the point before el7 -> .el7 it should be enough, that should work too: grep -m 1 -oP 'dbus.*\.el7' file – chaos – 2014-10-31T08:42:04.480

Thanks for explaining what [^ ]+ means. Yes, grep -m 1 -oP $(echo $x)'.*\.el7' scratch$x also works. In the alternative solution where you used sed, can you please explain how you figured out that s/.\ and ./\1/ will remove everything before and after the stuff in the brackets? Or what s/.\ and ./\1/ means? Thanks again. – quickbooks – 2014-10-31T09:12:05.187

1@quickbooks in the sed command s means search and replace. Search for what is between the first 2 slashes and replace it with what is between slash 2 and 3. .* means match everything, the part in the brackets () is what we look for followed by .* (everything again). So we replace the whole line with \1. \1 means the subpattern that is inside the brackets (). The p at the end stands for print. – chaos – 2014-10-31T09:25:13.827