matching start of the line in cygwin regexp

2

I apply sed 's/^ bug*/__/' to

  ref      bug
  bug      ref

and get

  ref     __
 __      ref

The same is responded for perl, perl -ni -e 's/^ bug/__/; print'. Sample file can be downloaded from here. Sed and perl are cygwin utilities that I run from the Windows command line. When I run them through cygwin bash, this stuff does not happen.

Val

Posted 2013-02-12T14:38:25.943

Reputation: 1

Answers

4

The problem has to do with cmd.exe's "quirks" (for lack of a better word) with handling non-alphanumeric characters within single quotes in this case. Essentially, the caret and space characters are being ignored.

The easiest way to avoid the problem (if running it in a proper Cygwin bash shell is not a desirable option) would be to use double quotes instead...

C:\cygwin\home\costa\wk>sed 's/^  bug*/__/' sed.txt
  ref    __
__      ref

C:\cygwin\home\costa\wk>sed "s/^  bug*/__/" sed.txt
  ref      bug
__      ref

Costa

Posted 2013-02-12T14:38:25.943

Reputation: 491

what have you used there? cygwin's sed from cmd.exe ? That's cheating, you may as well use a windows port of sed. – barlop – 2013-02-12T18:47:29.747

OP said that was what he was doing (running Cygwin's sed executable from cmd.exe), so that was the source of his problem and that's why I answered that use case. :) I made sure to mention that running within a proper Cygwin env makes this workaround unnecessary, but I don't know his situation. Maybe that's not an option for him for some reason. – Costa – 2013-02-12T18:53:11.690

First, there is no cheating. Second, all sed instructors teach using single quotes. Third, single quotes work in Windows -- double quotes work in cygwin. Nothing mad here. Finally, I told that problem occurs in Windows. I told that I used cygwin bash only for testing. Nothing mad is here. You just cannot read. So, the only crazy person, who came here to produce dumps of garbage here is you. – Val – 2013-02-12T19:15:42.693

0

First of all, use this instead:

sed 's/^ *bug-+/__/' input

That way it will work for multiple spaces before bug and one or more dashes after it. That is just a minor detail though. The command you posted works fine on my Debian.

Could you post the actual file you are trying to modify somewhere? I am guessing that you have either windows or Mac style line endings and that can confuse sed. If I am right, this should help:

perl -pi -e 's/\r\n|\n|\r/\n/g' input

Then run the same sed command on the file again.

If that doesn't work, there may be something specific about cygwin's sed. Try this Perl command instead (after making sure that the end of line character is \n by the command above):

perl -ne 's/^\s*bug-+/__/; print'

terdon

Posted 2013-02-12T14:38:25.943

Reputation: 45 216

I have replaced \r\n with \n and tried your perl script applied to file with single ref--91 0,667 | ref--49 0,182 | bug--32 0,354 | ref-100 0,500 line. Result is ref--91 0,667 | ref--49 0,182 |__32 0,354 | ref-100 0,500. So, perl seems to behave as bad as cygwin/sed. Regular expressions are broken everywhere. – Val – 2013-02-12T15:17:48.043

Umm... it works perfectly well here. Could you post your actual file somewhere? It looks like there is something strange with it. – terdon – 2013-02-12T15:26:21.813

I have included the link into the question – Val – 2013-02-12T15:38:07.690

Well, the file you uploaded has no new line (\n) at the end of the line but that should be irrelevant here. Both my Perl command and your sed command give the desired output. Try this instead: sed -b s/^ bug--*/__/ does that work? – terdon – 2013-02-12T15:45:33.590

No, when I apply sed -b 's/^ bug*/__/' to spaces ref spaces bug I get spaces ref spaces __ and 's/^\s* bug*//' produces ` spaces ref ` – Val – 2013-02-12T16:27:58.267

0

(This is meant to be a comment, but I don't have enough reputation to add a comment yet...)

You just fried my brain. In my Cygwin, the same thing happens. I was shocked. It appears to be a bug in how the wildcards are handled, whether they are using regular expression syntax or glob-style (glob--style is where * means 'any number of any type of character', regular expression * means '0 or more of the previous characters'.

So I tried it in my QNX shell. It works IFF I don't try to use a plus before bug, as in "s/^ +bug-/__/". I can substitute a * in place of the + and it works. I think some implementations of sed are having a hard time picking regex or glob syntax and the result is an unpredictable mess.

I didn't try Perl (haven't installed it yet on this new machine), but I would be doubly shocked if Perl handled it as poorly.

To answer your question, to the best of my knowledge and ~100,000 of my closest friends on Google, your understanding of how the ^ operator should work is accurate.

kmort

Posted 2013-02-12T14:38:25.943

Reputation: 1 609

I do not run sed from cygwin, to be hornest. I use Windows command line but I believe that cygwin sed utility is called since Windows does not have any. But, if you run cygwin command line then we should agree that the problem is not command line processor. – Val – 2013-02-12T16:51:28.040

@Val hang on, could you please explain exacly whatyou are doing? Is it on the windows command line? On Linux? On cygwin? – terdon – 2013-02-12T17:01:00.833

IMO, I explicitly say that use Windows command line and cygwin's sed is called. Both my sed and perl are cygwin-provided. – Val – 2013-02-12T17:17:08.067