GNU bug report logs - #25707
[PATCH] grep: don't forcefully strip carriage returns

Previous Next

Package: grep;

Reported by: Eric Blake <eblake <at> redhat.com>

Date: Mon, 13 Feb 2017 19:25:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 25707 in the body.
You can then email your comments to 25707 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Mon, 13 Feb 2017 19:25:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Eric Blake <eblake <at> redhat.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Mon, 13 Feb 2017 19:25:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: bug-grep <at> gnu.org
Subject: [PATCH] grep: don't forcefully strip carriage returns
Date: Mon, 13 Feb 2017 13:23:58 -0600
Commit 5c92a54 made the mistaken assumption that using fopen("rt")
on platforms where O_TEXT is non-zero makes sense.  However, POSIX
already requires fopen("r") to open a file in text mode, vs.
fopen("rb") when binary mode is wanted, and at least on Cygwin,
where it is possible to control whether a mount point is binary
or text by default (using just "r"), the use of fopen("rt") actively
breaks assumptions on a binary mount by silently corrupting any
carriage returns that are supposed to be preserved.

* src/grep.c (main): Never use fopen("rt").
Signed-off-by: Eric Blake <eblake <at> redhat.com>
---
 src/grep.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/grep.c b/src/grep.c
index 74acb0b..ce8859b 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -2586,7 +2586,7 @@ main (int argc, char **argv)
         break;

       case 'f':
-        fp = STREQ (optarg, "-") ? stdin : fopen (optarg, O_TEXT ? "rt" : "r");
+        fp = STREQ (optarg, "-") ? stdin : fopen (optarg, "r");
         if (!fp)
           die (EXIT_TROUBLE, errno, "%s", optarg);
         oldcc = keycc;
-- 
2.9.3





Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Mon, 13 Feb 2017 20:01:02 GMT) Full text and rfc822 format available.

Message #8 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Mon, 13 Feb 2017 12:00:43 -0800
On 02/13/2017 11:23 AM, Eric Blake wrote:
> the use of fopen("rt") actively
> breaks assumptions on a binary mount by silently corrupting any
> carriage returns that are supposed to be preserved.

Surely it's more typical for trailing CRs to be ignored rather than be 
preserved, even on binary mounts.

The intent of that "rt" (introduced on 2014-04-11 in commit 
5c92a54ec36f176854f0b0cca83b5ff224f2d8e8) was to simplify the code, 
which formerly used undossify_input to guess whether a pattern file had 
trailing CRs. If we change "rt" to "r", surely we'd need to reintroduce 
undossify_input; and wouldn't undossify_input also "silently corrupt" 
carriage returns in most cases?

While we're on the topic, the undossify_input approach is just a 
heuristic and it sometimes guesses wrong. I wish the heuristic could be 
removed somehow, so that grep would behave more deterministically on 
MS-DOS/Windows.





Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Mon, 13 Feb 2017 20:22:01 GMT) Full text and rfc822 format available.

Message #11 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Mon, 13 Feb 2017 14:20:56 -0600
[Message part 1 (text/plain, inline)]
On 02/13/2017 02:00 PM, Paul Eggert wrote:
> On 02/13/2017 11:23 AM, Eric Blake wrote:
>> the use of fopen("rt") actively
>> breaks assumptions on a binary mount by silently corrupting any
>> carriage returns that are supposed to be preserved.
> 
> Surely it's more typical for trailing CRs to be ignored rather than be
> preserved, even on binary mounts.

On binary mounts, it is atypical to have a file with trailing CRs in the
first place - but when they are there, they are intentional. Stripping a
trailing CR on a file that was placed in a binary mount in order to
preserve the CR is what is wrong.

> 
> The intent of that "rt" (introduced on 2014-04-11 in commit
> 5c92a54ec36f176854f0b0cca83b5ff224f2d8e8) was to simplify the code,
> which formerly used undossify_input to guess whether a pattern file had
> trailing CRs. If we change "rt" to "r", surely we'd need to reintroduce
> undossify_input; and wouldn't undossify_input also "silently corrupt"
> carriage returns in most cases?

No, we don't need undossify_input.  Generally, the pattern file is
already clean (using fopen("r") on a text mount already got rid of all
CR, and using fopen("r") on a binary mount shouldn't have CR in the
first place).  And in the rare cases where you DO have CR in the pattern
file on a binary mount, then it is probably intended that you are
searching for CR as part of the pattern (if not, the user should use a
purpose-built applications, like d2u (dos2unix), to clean up the pattern
file before passing that file to 'grep -f').

You are correct that undossify_input would silently corrupt carriage
returns, but I'm arguing that we don't ever want to be in the business
of removing CR (either the OS has already removed them on our behalf
because of the text mount, or the user doesn't want them removed because
of the binary mount).

Years ago, Cygwin actively advertised that you could create text mount
points.  It generated a LOT of mailing list traffic about various
different apps that misbehaved in subtly different ways (not the least
of these is the fact that lseek() on a file opened in text mode is prone
to return the WRONG offset, which makes seeking back to the last-used
byte unreliable in applications that only partially consume input while
still complying with the POSIX rule that the next app to share stdin
picks up where the first one left off).  It's now been more than 6 years
since Cygwin has actively discouraged new installations from having a
text mount (they are still possible, but no longer a default
installation choice), and there have actually been far fewer complaints
about binary-only mounts misbehaving, except in programs like grep that
eat CR from a file that is supposed to be preserved in binary mode.

> 
> While we're on the topic, the undossify_input approach is just a
> heuristic and it sometimes guesses wrong. I wish the heuristic could be
> removed somehow, so that grep would behave more deterministically on
> MS-DOS/Windows.
> 

I'm of the opinion that undossify_input causes more problems than it
solves.  We should trust fopen("r") to do the right thing, rather than
reinventing it ourselves.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Tue, 14 Feb 2017 23:09:02 GMT) Full text and rfc822 format available.

Message #14 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Tue, 14 Feb 2017 15:08:11 -0800
[Message part 1 (text/plain, inline)]
On 02/13/2017 12:20 PM, Eric Blake wrote:
> undossify_input causes more problems than it
> solves.  We should trust fopen("r") to do the right thing, rather than
> reinventing it ourselves.

Yes, that makes sense. Attached is a proposed patch to implement this. 
It assumes the patch you already submitted for Bug#25707.

This patch keeps the -U option, for MS-Windows users who want to 
override fopen "-r"'s choice of binary vs text I/O. Perhaps that's too 
conservative? It would be easy to turn -U into a no-op too.

[0001-Simplify-U-on-MS-Windows-by-removing-guesswork.txt (text/plain, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Wed, 15 Feb 2017 14:24:02 GMT) Full text and rfc822 format available.

Message #17 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Wed, 15 Feb 2017 08:23:19 -0600
[Message part 1 (text/plain, inline)]
On 02/14/2017 05:08 PM, Paul Eggert wrote:
> On 02/13/2017 12:20 PM, Eric Blake wrote:
>> undossify_input causes more problems than it
>> solves.  We should trust fopen("r") to do the right thing, rather than
>> reinventing it ourselves.
> 
> Yes, that makes sense. Attached is a proposed patch to implement this.
> It assumes the patch you already submitted for Bug#25707.
> 
> This patch keeps the -U option, for MS-Windows users who want to
> override fopen "-r"'s choice of binary vs text I/O. Perhaps that's too
> conservative? It would be easy to turn -U into a no-op too.

No, keep -U exactly as proposed in the patch (sometimes you WANT to
force binary mode - forcing binary is a no-op on Unix platforms, but
makes sense on Windows).  I like the patch as you wrote it, modulo nits:

> .
> -This option has no effect
> -on platforms other than MS-DOS and MS-Windows.
> +@cindex MS-Windows binary I/O
> +@cindex binary I/O, /MS-Windows

Why the slash before MS?

> 
> +
> +This option has no effect on platforms other than MS-Windows.

There are other systems than MS-Windows that have non-zero O_BINARY.  I
don't know if you want to reword this a bit, to maybe state that the
option has no effect on platforms where binary mode is identical to text
mode.  I guess your wording is fine, though.

> 
> @@ -1862,7 +1856,7 @@ grepdesc (int desc, bool command_line)
>  
>    /* Set input to binary mode.  Pipes are simulated with files
>       on DOS, so this includes the case of "foo | grep bar".  */
> -  if (O_BINARY && !isatty (desc))
> +  if (binary && !isatty (desc))
>      set_binary_mode (desc, O_BINARY);
>  

The comment is somewhat stale; pipes are not simulated in MS-Windows,
and these days ports to older DOS are rare (is DJGPP still viable?).
Probably safe to just delete the second sentence.

> 
>    /* Output is set to binary mode because we shouldn't convert
>       NL to CR-LF pairs, especially when grepping binary files.  */
> -  if (O_BINARY && !isatty (STDOUT_FILENO))
> -    set_binary_mode (STDOUT_FILENO, O_BINARY);
> +  if (binary && !isatty (STDOUT_FILENO))
> +    xfreopen (NULL, "wb", stdout);

xfreopen() may need a patch - if stdout was previously opened in append
mode, this reopen breaks that.  Several of the coreutils are affected by
the same problem; here's the patch I've been applying when building
coreutils for Cygwin:

 void
 xfreopen (char const *filename, char const *mode, FILE *fp)
 {
+  if (!filename && STREQ (mode, "wb"))
+    {
+      int flag = fcntl (fileno (fp, F_GETFL);
+      if (0 <= flag && (flag & O)APPEND))
+        mode = "ab";
+    }
   if (!freopen (filename, mode, fp))

I guess I should push that one into gnulib.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Wed, 15 Feb 2017 14:40:02 GMT) Full text and rfc822 format available.

Message #20 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Wed, 15 Feb 2017 08:38:59 -0600
[Message part 1 (text/plain, inline)]
On 02/14/2017 05:08 PM, Paul Eggert wrote:
> On 02/13/2017 12:20 PM, Eric Blake wrote:
>> undossify_input causes more problems than it
>> solves.  We should trust fopen("r") to do the right thing, rather than
>> reinventing it ourselves.
> 
> Yes, that makes sense. Attached is a proposed patch to implement this.
> It assumes the patch you already submitted for Bug#25707.
> 
> This patch keeps the -U option, for MS-Windows users who want to
> override fopen "-r"'s choice of binary vs text I/O. Perhaps that's too
> conservative? It would be easy to turn -U into a no-op too.

For reference, here's the patch I applied to Cygwin when building
grep-3.0; it didn't go as far as your patch in killing -u (so I like
your patch better), but had the same net effect of keeping -U as a way
to force binary behavior, while using fcntl() to learn (after-the-fact)
whether our force-to-binary was a no-op, and only doing undossify stuff
if we turned a text-mode into binary-mode:

--- origsrc/grep-3.0/src/dosbuf.c	2017-01-01 05:34:33.000000000 -0600
+++ src/grep-3.0/src/dosbuf.c	2017-02-14 09:50:08.701835000 -0600
@@ -48,6 +48,7 @@ static struct dos_map *dos_pos_map;
 static int       dos_pos_map_size  = 0;
 static int       dos_pos_map_used  = 0;
 static int       inp_map_idx = 0, out_map_idx = 1;
+static int       mode_forced;

 /* Set default DOS file type to binary.  */
 static void
@@ -95,7 +96,7 @@ guess_type (char *buf, size_t buflen)
 static size_t
 undossify_input (char *buf, size_t buflen)
 {
-  if (! O_BINARY)
+  if (! O_BINARY || ! mode_forced)
     return buflen;

   size_t bytes_left = 0;
@@ -186,7 +187,7 @@ undossify_input (char *buf, size_t bufle
 static off_t
 dossified_pos (off_t byteno)
 {
-  if (! O_BINARY)
+  if (! O_BINARY || ! mode_forced)
     return byteno;

   off_t pos_lo;
--- origsrc/grep-3.0/src/grep.c	2017-02-08 19:37:33.000000000 -0600
+++ src/grep-3.0/src/grep.c	2017-02-14 09:58:11.134855800 -0600
@@ -1860,10 +1860,17 @@ grepdesc (int desc, bool command_line)
       goto closeout;
     }

-  /* Set input to binary mode.  Pipes are simulated with files
-     on DOS, so this includes the case of "foo | grep bar".  */
+  /* Force input to binary mode, while remembering if it was originally
+     text.  Options -u and -U only matter on text mounts.  */
+  mode_forced = 0;
   if (O_BINARY && !isatty (desc))
-    set_binary_mode (desc, O_BINARY);
+    {
+      if (fcntl (desc, F_GETFL) & O_TEXT)
+	{
+	  set_binary_mode (desc, O_BINARY);
+	  mode_forced = 1;
+	}
+    }

   count = grep (desc, &st, &ineof);
   if (count_matches)
@@ -2586,7 +2593,7 @@ main (int argc, char **argv)
         break;

       case 'f':
-        fp = STREQ (optarg, "-") ? stdin : fopen (optarg, O_TEXT ? "rt"
: "r");
+        fp = STREQ (optarg, "-") ? stdin : fopen (optarg, "r");
         if (!fp)
           die (EXIT_TROUBLE, errno, "%s", optarg);
         oldcc = keycc;

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Thu, 16 Feb 2017 16:58:02 GMT) Full text and rfc822 format available.

Message #23 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eric Blake <eblake <at> redhat.com>, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 08:57:50 -0800
[Message part 1 (text/plain, inline)]
On 02/15/2017 06:23 AM, Eric Blake wrote:
> xfreopen() may need a patch

Thanks for pointing out the problems with my suggestion. I think I'd 
rather leave xfreopen alone, as silently adjusting its argument from 
"wb" to "ab" might cause other problems. Instead, I added a Gnulib 
module to set the file's text-vs-binary mode and changed grep to use 
that, by installing the attached patches, which I hope address your 
comments.

[0001-build-update-gnulib-submodule-to-latest.patch (application/x-patch, attachment)]
[0002-Fix-up-recent-U-patches.patch (application/x-patch, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Thu, 16 Feb 2017 17:27:02 GMT) Full text and rfc822 format available.

Message #26 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: eggert <at> cs.ucla.edu, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 19:26:39 +0200
> From: Eric Blake <eblake <at> redhat.com>
> Date: Mon, 13 Feb 2017 14:20:56 -0600
> 
> > While we're on the topic, the undossify_input approach is just a
> > heuristic and it sometimes guesses wrong. I wish the heuristic could be
> > removed somehow, so that grep would behave more deterministically on
> > MS-DOS/Windows.
> > 
> 
> I'm of the opinion that undossify_input causes more problems than it
> solves.  We should trust fopen("r") to do the right thing, rather than
> reinventing it ourselves.

FYI: You'd be losing an important feature for non-Cygwin DOS/Windows
users if you remove undossify_input and decide to trust fopen's "r"
(or "rt") mode.  That's because reading a file which was opened in
text-mode generally removes _all_ CR characters, even if they are not
followed by a newline; it also stops on the first ^Z character in the
file, treating it as a kind of "software EOF", a legacy from CP/M
years.

That's why the original patch switched the file descriptor to binary
mode (Grep used 'open', not 'fopen', in those days) and used
undossify_input: that allowed Grep to DTRT with these use cases,
removing CRs only if they are followed by a newline, and not stopping
at ^Z.  As a side effect, undossify_input also collects the
information needed for displaying byte offsets.

It seems to me that when one bumps into some code which looks
incorrect or less than optimal, and one considers its replacement with
a more clever code, it would be a good idea to ask the person(s) who
contributed the original code, in case there was some good reason for
doing it that way.  Was that done in this case?  If not, it should
have been.




Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Thu, 16 Feb 2017 17:41:02 GMT) Full text and rfc822 format available.

Message #29 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 11:40:29 -0600
[Message part 1 (text/plain, inline)]
On 02/16/2017 11:26 AM, Eli Zaretskii wrote:

>> I'm of the opinion that undossify_input causes more problems than it
>> solves.  We should trust fopen("r") to do the right thing, rather than
>> reinventing it ourselves.
> 
> FYI: You'd be losing an important feature for non-Cygwin DOS/Windows
> users if you remove undossify_input and decide to trust fopen's "r"
> (or "rt") mode.

"rt" mode is not required to exist. And I don't know any modern
implementation of "r" mode on a system with non-zero O_BINARY that eats
ALL \r - both Cygwin and mingw just change \r\n into \n while still
preserving other \r.  The undossify() code that Paul just removed did
NOT behave the same as text mode (in that it did, perhaps
unintentionally, eat ALL \r).

I count it worse to TRY and reimplement the OS "r" mode and get the
implementation wrong, with more lines of code, than to just trust the OS
to do it correctly in the first place.

The undossify() code may do the right thing on text files, but is
absolutely wrong on binary files.  My alternative patch was less
intrusive, and at least preserves the undossify behavior on text files,
but with the requirement that fcntl(0, F_GETFL) & O_TEXT is a reliable
way to tell if an fd is currently opened in text mode before slamming it
over to the binary mode required for undossify to work.

>  That's because reading a file which was opened in
> text-mode generally removes _all_ CR characters, even if they are not
> followed by a newline; it also stops on the first ^Z character in the
> file, treating it as a kind of "software EOF", a legacy from CP/M
> years.

No, I don't know of any fopen(,"r") code that eats _all_ CR.  Certainly
cygwin's implementation does not do that.

> 
> That's why the original patch switched the file descriptor to binary
> mode (Grep used 'open', not 'fopen', in those days) and used
> undossify_input: that allowed Grep to DTRT with these use cases,
> removing CRs only if they are followed by a newline, and not stopping
> at ^Z.  As a side effect, undossify_input also collects the
> information needed for displaying byte offsets.

Yes, you do make a point that the side effect of reimplementing text
mode ourselves on a forced binary fd lets us "count" byte offsets where
the count could be text while the scan was binary, or where the count
could be binary while the scan was text.  But in reality, are there any
users that ever want a mixed-mode count?  If you are scanning in binary,
you want the binary count; if you are scanning in text, you want the
text count.  And in those two scenarios, with a sane text-mode from the
OS, still give the correct counts with a lot less code in grep.

> 
> It seems to me that when one bumps into some code which looks
> incorrect or less than optimal, and one considers its replacement with
> a more clever code, it would be a good idea to ask the person(s) who
> contributed the original code, in case there was some good reason for
> doing it that way.  Was that done in this case?  If not, it should
> have been.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Thu, 16 Feb 2017 17:46:02 GMT) Full text and rfc822 format available.

Message #32 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, Eric Blake <eblake <at> redhat.com>
Cc: 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 09:45:43 -0800
On 02/16/2017 09:26 AM, Eli Zaretskii wrote:
> It seems to me that when one bumps into some code which looks
> incorrect or less than optimal, and one considers its replacement with
> a more clever code,

No, that's not it. The idea was to simplify maintenance for the GNU 
case, perhaps at some cost to MS-Windows users, but they are lower 
priority so it is OK. The MS-Windows code was a mess because the two 
forms of auto binary-file detection could not work in combination; this 
was because one of the two forms was contributed later and without 
regard to MS-Windows. Perhaps the MS-Windows port could be improved 
without undue hassle to the GNU code, although that would require 
someone with MS-Windows expertise and enough free time to do it. Failing 
that, I suppose MS-Windows users can use -a and -U if they run into 
trouble, so it's not that big a deal.





Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Thu, 16 Feb 2017 18:06:02 GMT) Full text and rfc822 format available.

Message #35 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Eric Blake <eblake <at> redhat.com>
Cc: eggert <at> cs.ucla.edu, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 20:06:07 +0200
> Cc: eggert <at> cs.ucla.edu, 25707 <at> debbugs.gnu.org
> From: Eric Blake <eblake <at> redhat.com>
> Date: Thu, 16 Feb 2017 11:40:29 -0600
> 
> On 02/16/2017 11:26 AM, Eli Zaretskii wrote:
> 
> >> I'm of the opinion that undossify_input causes more problems than it
> >> solves.  We should trust fopen("r") to do the right thing, rather than
> >> reinventing it ourselves.
> > 
> > FYI: You'd be losing an important feature for non-Cygwin DOS/Windows
> > users if you remove undossify_input and decide to trust fopen's "r"
> > (or "rt") mode.
> 
> "rt" mode is not required to exist. And I don't know any modern
> implementation of "r" mode on a system with non-zero O_BINARY that eats
> ALL \r - both Cygwin and mingw just change \r\n into \n while still
> preserving other \r.  The undossify() code that Paul just removed did
> NOT behave the same as text mode (in that it did, perhaps
> unintentionally, eat ALL \r).

I explicitly said my comments were not about Cygwin.

And you are forgetting the "stop at first ^Z" misfeature of text-mode
reads.

> I count it worse to TRY and reimplement the OS "r" mode and get the
> implementation wrong, with more lines of code, than to just trust the OS
> to do it correctly in the first place.

It is no use trusting the OS if it doesn't DTRT.

> The undossify() code may do the right thing on text files, but is
> absolutely wrong on binary files.

Grep is mainly a text-processing program.  Its use with binary files
is a much rarer use case, and the user has opt-in options for those.
IMO it is more important to DTRT by default in the usual cases than
err in rare cases when the user fails to specify those opt-in options
needed to support  that correctly.

> No, I don't know of any fopen(,"r") code that eats _all_ CR.

I do.

> Yes, you do make a point that the side effect of reimplementing text
> mode ourselves on a forced binary fd lets us "count" byte offsets where
> the count could be text while the scan was binary, or where the count
> could be binary while the scan was text.  But in reality, are there any
> users that ever want a mixed-mode count?

Yes.

> If you are scanning in binary, you want the binary count; if you are
> scanning in text, you want the text count.

That's up to the user; Grep shouldn't second-guess its users, and
shouldn't force them into a specific modus operandi without a fire
escape.

But this is a futile argument: I don't expect to win it, and I'm well
aware that recent Grep releases are much less friendly to non-Posix
users than previous ones.  Which is why I will stay with Grep 2.10,
which works satisfactorily for me.

The purpose of my message was to get on record about what these
changes mean: they mean losing features which some find useful.  You
are not changing code which was written by someone who didn't know
about text-mode I/O, or didn't understand that this could be done
better, faster, etc., by dropping features.




Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Thu, 16 Feb 2017 18:09:01 GMT) Full text and rfc822 format available.

Message #38 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: eblake <at> redhat.com, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 20:08:33 +0200
> Cc: 25707 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Thu, 16 Feb 2017 09:45:43 -0800
> 
> On 02/16/2017 09:26 AM, Eli Zaretskii wrote:
> > It seems to me that when one bumps into some code which looks
> > incorrect or less than optimal, and one considers its replacement with
> > a more clever code,
> 
> No, that's not it. The idea was to simplify maintenance for the GNU 
> case, perhaps at some cost to MS-Windows users, but they are lower 
> priority so it is OK.

I'm saying these changes lose features for those users.  I'm well
aware that their needs are of a very low priority to you.  I just
don't want people to think that the new code is equivalent to the old
one, when it isn't.




Information forwarded to bug-grep <at> gnu.org:
bug#25707; Package grep. (Thu, 16 Feb 2017 22:26:01 GMT) Full text and rfc822 format available.

Message #41 received at 25707 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eblake <at> redhat.com, 25707 <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Thu, 16 Feb 2017 14:25:15 -0800
On 02/16/2017 10:08 AM, Eli Zaretskii wrote:

> I'm well aware that their needs are of a very low priority to you.

I would not say "very low priority". (To me, "very low priority" would 
be something like supporting GNU grep on IRIX. :-) Regardless, it's 
better not to try to personalize any disagreement about priorities. I 
didn't file this bug report, and I didn't set the priority of MS-Windows 
for the GNU project.

> Grep shouldn't second-guess its users

The new code has less guesswork than the old code did, so in that sense 
it's an improvement.

> I just don't want people to think that the new code is equivalent to the old one

We should be OK on that point, as the updated NEWS file contains an 
entry prominently describing the change and the documentation has been 
updated to reflect the changed behavior.





Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 15 Oct 2019 22:39:02 GMT) Full text and rfc822 format available.

Notification sent to Eric Blake <eblake <at> redhat.com>:
bug acknowledged by developer. (Tue, 15 Oct 2019 22:39:02 GMT) Full text and rfc822 format available.

Message #46 received at 25707-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eblake <at> redhat.com, 25707-done <at> debbugs.gnu.org
Subject: Re: bug#25707: [PATCH] grep: don't forcefully strip carriage returns
Date: Tue, 15 Oct 2019 15:37:54 -0700
No further comment for a couple of years, so I'm closing this bug 
report. Any remaining issues can be re-raised in a new report as needed.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 13 Nov 2019 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 157 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.