Non-deterministic behavior of Grep in conjunction with More

1

What would you expect to happen after this:

for /l %i in (1,1,100) do @more some.bbl | grep a | md5sum

Most probably, not this:

ec3ecb76408d4225ff23a25d0596e00f *-
13cfd899b90b9cd7aedb406a785e8eac *-
737e8898a65657f1a2ce8012ff1ffe82 *-
d4095243e56a7da3b31a352423a5417a *-
319db7810e677414ca1609238bdeba6f *-
31e626a8ce0732fda1fa7499c8b13dfa *-
006fe390f923d50348d65d0bbefa64d8 *-
77708f62cb2d61a45788a656d0979aee *-
cda10a9ab71c2bce4df069c479241349 *-
b01b71dc7dca11808ca989c4985513ca *-
c22a6f8b1cac9a93c4fe10b07a9f483a *-
0b04f4b24f3f183270eb7414f4f86e3d *-
5a2f8b8ad482ae8f70b7ce3384a7c9e2 *-
beccdbe737b48c02b48c4524cd89eede *-
a16fec5238cfe8dfff6b403ff943a8ca *-
ec0cd2edc0009abd14119915a8b563f4 *-
1e78f0012ca09aeade169f815415da40 *-
...

I was worried, too, so I ran a couple of sanity checks:

for /l %i in (1,1,100) do @more some.bbl | md5sum

yields 100 times

ace4f37f3a1433e29696a535c0b79f2c *-

Same for

for /l %i in (1,1,100) do @grep a some.bbl | md5sum

and

d8753d755025a1119cd2910c6f5cb0de *-

So more, grep and md5sum work fine by themselves. Also, the pipe before md5sum is not a problem, since

for /l %i in (1,1,100) do @more some.bbl | grep a > out%i
md5sum out*

confirms the issue. fcing the outputs, I find no difference. diffing them revels invisible differences, confirmed by a hex editor to be differences in line endings in seemingly random places (and different from file to file).

The issue is still seen, but less often so, in this example:

for /l %i in (1,1,100) do @more some.bbl | grep "[a-z]" | md5sum

yielding

b135bcfe0bcfb7f1c43fe1905164c31e *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
ef23817185d41987c11cb1fc4371bb76 *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
e398e63b60cee3e271967f01350068f1 *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
b135bcfe0bcfb7f1c43fe1905164c31e *-
...

Now I am running out of ideas what the reason could be. I would not care about this much if I did not lose any valid lines in cases like this:

for /l %i in (1,1,100) do @more "some.bbl" | grep "\}$" | wc -l

This gives

249
249
249
248
255
253
252
248
251
...

To reproduce similar issues, you can use this file

for /l %i in (1,1,200) do @echo XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX{Something1999a}>> some.bbl

Some more information

C:\>ver

Microsoft Windows [Version 6.1.7601]

C:\>more /h

Displays output one screen at a time.

MORE [/E [/C] [/P] [/S] [/Tn] [+n]] < [drive:][path]filename
...

C:\>grep --ver

GNU grep 2.6.3

Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

C:\>md5sum --ver

md5sum (GNU coreutils) 8.15
Packaged by Cygwin (8.15-1)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later     <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Ulrich Drepper, Scott Miller, and David Madore.

Why is this happening?

Update: The problem also goes away by replacing more by this cat:

C:\>cat --ver

cat (GNU coreutils) 8.15
Packaged by Cygwin (8.15-1)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Torbjörn Granlund and Richard M. Stallman.

bers

Posted 2015-06-12T14:12:52.567

Reputation: 557

Why are you using more in this way and what exactly are you trying to achieve? – qasdfdsaq – 2015-06-12T14:39:57.703

I don't recall why I initially used more. I did notice, though, that in some cases more | grep is faster than grep. Anyway, isn't it completely irrelevant why I am using more (and not cat, or grep, or whatever) with respect to my question, WHY the combination of command-line tools behaves in a non-deterministic way? – bers – 2015-06-12T16:28:31.497

I think it's relevant because more is not intended to be used in this way. My guess is because you are using more in a way it is not designed to function it is forking or returning at the wrong time, or mis-detecting your terminal parameters because there is no terminal. I wouldn't be surprised at a program behaving non-deterministically when it's documented behaviour is "undefined" – qasdfdsaq – 2015-06-12T16:32:17.107

Oh. So you think the fact that more expects terminal input at the end of a screen (when output is not redirected) might interfere with the output? And the apparent randomness basically results from the position on a virtual terminal window? This might actually make sense. – bers – 2015-06-12T16:38:16.073

No answers