Getting this simple regular expression to match in grep

-2

I want to match a quote, 2, a space, and any character that is not a literal dot.

This is using GnuWin32 grep. Not Cygwin's grep.

C:\>echo "2 008abc.html" | grep -oiP \"2 [^.]
grep: [^.]': No such file or directory

C:\>echo "2 008abc.html" | grep -oiP ^"2 [^.]

C:\>echo "2 008abc.html" | grep -oiP """2 [^.]
grep: [^.]: No such file or directory

C:\>echo "2 008abc.html" | grep -oiP """2 0
grep: 0: No such file or directory

C:\>echo "2 008abc.html" | grep -oiP """"2 0"
"2 0


C:\>echo "2 008abc.html" | grep -oiP """"2 [^.]"

C:\>echo "2 008abc.html" | grep -oiP """"2 0"
"2 0

(I have answered my own question in its prior revision, no need to refer to it, but it leads to another strongly related matching problem, so I've revised this question to matching something very similar, but running into a problem.)

barlop

Posted 2011-08-27T01:53:51.707

Reputation: 18 677

Answers

0

This is a solution.

C:\>echo "2 008abc.html" | grep -oiP \"2" "[^.]
"2 0

This experimentation helped (w is w.exe, which is w.c compiled)

C:\>w \"2\ [^.]
argv[0] = w
argv[1] = "2\
argv[2] = [^.]

C:\>w \"2" "[^.]
argv[0] = w
argv[1] = "2 [^.]

C:\>

Here is another solution

C:\>echo "2 008abc.html" | grep -oiP "\"2 [^^.]"
"2 0

which as you can see I found after a little fiddling, though found pretty quickly

W:\other>w "\"2 [^.]"
argv[0] = w
argv[1] = "2 [.]

W:\other>w "\"2 [\^.]"
argv[0] = w
argv[1] = "2 [\.]

W:\other>w "\"2 [^.]"
argv[0] = w
argv[1] = "2 [.]

W:\other>w "\"2 [^^.]"
argv[0] = w
argv[1] = "2 [^.]

w.c

#include <stdio.h>

int main(int argc, char *argv[]) {
    int i = 0;
    while (argv[i]) {
        printf("argv[%d] = %s\n", i, argv[i]);
        i++;
    }
    return 0;
}

this one is useful prior to w.c You can use it to see exactly what bash removes. x.c

#include <stdio.h>
#include <windows.h>

int main(int argc, char *argv[]) {
    printf(GetCommandLine());
    return 0;
}

ex-

C:\>x &
x
C:\>
C:\>x ^&
x  &
C:\>

barlop

Posted 2011-08-27T01:53:51.707

Reputation: 18 677

2

It looks like you're using Windows Command Prompt (cmd.exe) as your shell, and you're getting tripped up by its quoting conventions, or lack thereof. If I run your command in Fedora 15 Bash shell, it works. If I run it in Windows using Cygwin's Bash shell, it works.

To get it to work with cmd.exe, you have to change the quotes and spacing. I ran the commands below in cmd.exe on Windows 7. Note how I changed the quotes on the grep command to use single quotes instead of double quotes, and there is no space before the pipe (|).

I am using the Cygwin version of GNU grep, which should behave the same as your Win32 GNU grep.

c:\>c:\cygwin\bin\grep --v
GNU grep 2.6.3

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

c:\>echo "2008abc.html"| c:\cygwin\bin\grep -oiP '\"[^.]'
"2

If there is a space before the pipe, the space will be echoed through the pipeline and grep will match it. This is due to the idiotic parsing behavior of cmd.exe.

c:\>echo "2008abc.html" | c:\cygwin\bin\grep -oiP '\"[^.]'
"2
"

For your own sanity, see if you can use Cygwin's Bash or any other shell with reasonable and consistent quoting conventions.

juggler

Posted 2011-08-27T01:53:51.707

Reputation: 1 530

how is the windows quoting conventions not reasonable or consistent? – barlop – 2011-08-27T03:24:14.143

and what is the issue with single quotes and double quotes? btw, your line worked '"[^.]' – barlop – 2011-08-27T03:30:29.440

I don't know why it works with single quotes and not with double quotes, but it works fine either way if you use bash shell instead of cmd.exe. I've seen enough weird issues with quoting and spacing in cmd.exe that I avoid it and use cygwin bash whenever possible. – juggler – 2011-08-27T04:04:27.357

@barlop: The differences are that in Windows the program has to parse its command line itself (in Unix this is done by the shell: sh or bash; in Cygwin this is done by the cygwin1.dll runtime), and that Windows uses \ as a path separator (bash treats it as an escape character). Many problems appear when you use Cygwin programs with Windows-style path names. (For example, how should the last \ in "C:\WINDOWS" be parsed? Should it work differently in Cygwin and a native Windows program?) – user1686 – 2011-08-27T16:41:28.740

@grawity well problems of using \dir\prog for cygwin, and /dir/prog for windows, are purely the sillyness of the user and not things that a techie would do, i'm not asking about that kind of problem. As to the last slash, I don't see how it's a problem, but problem with it aside, doesn't *nix also have that question of last slash or not. I notice in cygwin "echo */" puts a slash after every directory name. Whereas "echo *" doesn't put a slash after any directory name. And *nix interprets cd z/ as well as cd z – barlop – 2011-08-27T20:33:28.853

That totally was not my point. – user1686 – 2011-08-27T20:38:44.913

@grawity Well what do you mean when you say "{using} Cygwin programs with Windows-style path names." ? – barlop – 2011-08-27T21:57:31.480

@barlop: grep -r foo "C:\Documents and Settings\Simon Travaglia\" from the Windows cmd.exe shell. Does the final backslash act as path separator? Or does it escape the "? Similar for \D, \S. Another example: somecommand "funky\" characters" -- is that one argument funky" characters or two arguments funky\ and characters? – user1686 – 2011-08-27T21:58:56.470

@barlop: Yet another: cmd /c somecommand "foo bar" in which everything after /c is read as a single argument, despite not being quoted. That's where the inconsistency comes in. In comparison, bash has a strict set of rules by which every input line is parsed the same way. – user1686 – 2011-08-27T22:05:36.337

@gordoco Sorry gordoco, i've deselected your answer, I just realised, you answered it for cygwin's grep, and you did the weird thing of calling it from CMD.EXE, normally cygwin commands are run within cygwin proper like calling cygwin.bat first. Coincidentally, I notice echo "2008abc.html" | grep -oiP "[^.] <-- works you were incorrect I think, in thinking they worked the same in that single quotes won't do it for gnuwin32's, so your answer didn't do it for gnuwin32's which is what I was asking about. – barlop – 2011-08-28T01:20:16.080

@grawity I see what you mean that cmd /c dir a b, takes it as cmd /c "dir a b" and doesn't fail like runas would if the program parameter contained spaces and wasn't quoted, but any program run in a cmd prompt, could combine all parameters as one. can't any linux program do so too? or combine them or separate them, however it wants, and therefore, also be "inconsistent"? – barlop – 2011-09-11T21:54:01.197

@barlop: Not always. In Unix shells, you can use foobar, "foobar", 'foobar', "fo"ob'ar', but the shell handles word-splitting, and what it passes to execv() and what the program receives in argv[] will always be the same 6-byte string foobar. If you type "foo bar", you'll have argv[1] as foo bar. If you type "foo" "bar", you'll have argv[1] as foo and argv[2] as bar. For comparison, Win32 programs always receive one single string from GetCommandLine(), and all dequoting is done by the program itself - if it is done at all. – user1686 – 2011-09-12T07:20:06.493

@barlop: Continuing with my filename example. In Unix shells, if you have a double-quoted string and use a backslash in it, the backslash will always be treated the same: an escape for the character that follows. The shell applies same rules to all commands. On Windows, since programs do it themselves, they can treat it as an escape in some places and as path separator in others. Again, consider the following command line: somecmnd "foo bar\" baz" qux. Assume you are on a system that uses \ as path-separator. How would you split the command line into separate arguments? – user1686 – 2011-09-12T07:28:05.957

@grawity I tried http://pastebin.com/28Q2Wxxr compiled with TCC win32. And running it compiled in cygwin with gcc. the values in argsv seem to be the same between windows and *nix. and it dequotes it.. From what i've heard, it's true, that unlike *nix programs, Win32 C programs are given it as one single string, but one doesn't see that since it gets split (and it seems dequoted), even before the main method has run. So, still don't see where the room for inconsistency occurs relative to *nix programs.

– barlop – 2011-09-12T07:41:25.253

(will test/think about your continuing comment soon) – barlop – 2011-09-12T07:44:13.610

@grawwity You can use \ for a literal , and " for a literal quote. I guess you know that though.. I'm not quite sure what you mean by a system that uses \ as a path separator, the only systems I have any familiarity with, are windows and to an extent, unix, and the windows case of course uses \ within paths so I may be missing your point. – barlop – 2011-09-12T07:57:30.020

Let us continue this discussion in chat.

– user1686 – 2011-09-12T08:13:28.173