Parse and remove parts of strings between delimiters

I would like to go through a file and remove certain sequences in between delimiters.

For example

 drw---- 00000000 11111111        0 ./a/
 drw---- 00000000 11111111        0 ./b/
 d------ 00000000 11111111        0 ./c/
 d------ 00000000 11111111        0 ./d/k/
 d------ 00000000 11111111        0 ./e/l/r/
 d------ 00000000 11111111        0 ./f/m/s/x/
 ------- 00000000 11111111       89 ./g/n/t/y/C.xml
 dr----- 00000000 11111111        0 ./h/o/u/z/
 dr-r--- 00000000 11111111        0 ./i/p/v/A/D/
 d--r--- 00000000 11111111        0 ./j/q/w/B/

Would become

 drw---- ./a/
 drw---- ./b/
 d------ ./c/
 d------ ./d/k/
 d------ ./e/l/r/
 d------ ./f/m/s/x/
 ------- ./g/n/t/y/C.xml
 dr----- ./h/o/u/z/
 dr-r--- ./i/p/v/A/D/
 d--r--- ./j/q/w/B/

Where the starting delimiter is the 2nd space in the file, and the ending delimiter is ./

I'm really new to cygwin and all of it's clever tools, so I have no idea what to do. I'm pretty sure I could use sed and regular expressions somehow, but I simply don't know enough to come up with the solution on my own.

Millianz

Posted 2012-04-05T19:20:38.420

Reputation: 81

delimiter <-- that is how you spell it. You got it right elsewhere but not in the subject. The word limit is in there. – barlop – 2012-04-06T19:28:00.217

Answers

Simplest way to do it is using awk.

$ awk '{print $1, $5}' myfile.txt

awk reads the file line by line, sets some special variables and runs the command for each line. $1 and $5 here contain first and fifth string when a line is tokenized by using space as delimeter.

infiniteRefactor

Posted 2012-04-05T19:20:38.420

Reputation: 750

Damn this is a nice solution, thanks so much. I'll have to read up on GAWK, seems like it's very useful. – Millianz – 2012-04-05T19:54:18.037

2unless any filename has spaces. Then you might want to say awk '{$2=$3=$4=""; print}' – glenn jackman – 2012-04-05T21:33:32.953

Here is the regex you want. Either open the file in vim and run it, or do sed the_expression oldname > newname.

:%s/[0-9][0-9]*//g

Explanation:
The % symbol specifies that the following command should be run on the whole file.
s means search/for this expression/and replace it with this one/
In your case you want to delete all the numbers so we instruct vim's regex engine to search for every occurrence of one or more number and replace it with nothing.

Yitzchak

Posted 2012-04-05T19:20:38.420

Reputation: 4 084

This is a very good solution as well, gawk is a little simpler though if you're not that familiar with regex – Millianz – 2012-04-06T16:45:30.557

that one is just removing numbers not actually removing that which is between the "delimiters". In the example he gives it'd work as its numbers between the delimiters. What he meant is another matter. – barlop – 2012-04-06T19:27:41.213

@barlop I know, this was a quick and dirty solution for the data at hand. – Yitzchak – 2012-04-09T22:02:09.180

"Where the starting delimiter is the 2nd space in the file, and the ending delimiter is ./"

Here's an ugly one just for you

C:\sdf>type p.p
 drw---- 00000000 11111111        0 ./a/
 drw---- 00000000 11111111        0 ./b/
 d------ 00000000 11111111        0 ./c/
 d------ 00000000 11111111        0 ./d/k/
 d------ 00000000 11111111        0 ./e/l/r/
 d------ 00000000 11111111        0 ./f/m/s/x/
 ------- 00000000 11111111       89 ./g/n/t/y/C.xml
 dr----- 00000000 11111111        0 ./h/o/u/z/
 dr-r--- 00000000 11111111        0 ./i/p/v/A/D/
 d--r--- 00000000 11111111        0 ./j/q/w/B/
C:\sdf>sed -r "s/(\s+\S+\s*)([^.]*\.\/)/\1.\//" p.p
 drw---- ./a/
 drw---- ./b/
 d------ ./c/
 d------ ./d/k/
 d------ ./e/l/r/
 d------ ./f/m/s/x/
 ------- ./g/n/t/y/C.xml
 dr----- ./h/o/u/z/
 dr-r--- ./i/p/v/A/D/
 d--r--- ./j/q/w/B/
C:\sdf>

barlop

Posted 2012-04-05T19:20:38.420

Reputation: 18 677