GNU bug report logs -
#13498
"cut -f" lags a line
Previous Next
Reported by: Scott Lamb <slamb <at> slamb.org>
Date: Sat, 19 Jan 2013 17:27:01 UTC
Severity: normal
Done: Pádraig Brady <P <at> draigBrady.com>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13498 in the body.
You can then email your comments to 13498 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#13498
; Package
coreutils
.
(Sat, 19 Jan 2013 17:27:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Scott Lamb <slamb <at> slamb.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Sat, 19 Jan 2013 17:27:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
"cut -f" has an apparently long-standing behavior that I'd consider a
bug: it does not fully send line N to stdout until the first character
of line N+1 has been read on stdin. This is confusing when stdin comes
from "tail -f" or the like. The exact behavior varies slightly. If
stdin is a tty, all but the trailing newline will be flushed
immediately and then the trailing newline will be flushed when the
next character shows up. If stdin is not a tty, there's no flush at
all until the next character shows up.
For example, if I type the following into a shell on Ubuntu 12.04.1,
meaning cut from coreutils 8.13 and glibc package version
2.15-0ubuntu10.3:
cut -f1-
foo
bar
baz
^D
I will see the following:
$ cut -f1-
foo
foobar
barbaz
baz
$
and if I instead use "cat | cut -f1-" in the first line, I will see
the following:
$ cat | cut -f1-
foo
bar
foo
baz
bar
baz
$
(coreutils's cut -c does not have the same laggy behavior. Neither
does BSD cut on my OS X machine in either -c or -f mode.)
This code in cut_fields (still found in trunk tip) is responsible for
delaying the newline; it runs between the newline being read and being
written:
if (c == '\n')
{
c = getc (stream);
if (c != EOF)
{
ungetc (c, stream);
c = '\n';
}
}
I believe that code is there to avoid turning one newline at EOF into
two, but that goal could be accomplished in another way.
I don't know exactly why the behavior differs based on stdin being a
tty or not. My best guess is that glibc might have some logic that, if
stdin is a tty, automatically flushes stdout any time the program
blocks on stdin. glibc's stdio internals are a bit hard for me to
follow, so I haven't found the code in question. Apparently this is a
vaguely standardized behavior; I see a stackoverflow post mentioning
the following:
"""
The input and output dynamics of interactive devices shall take place
as specified in 7.19.3. The intent of these requirements is that
unbuffered or line-buffered output appear as soon as possible, to
ensure that prompting messages actually appear prior to a program
waiting for input.
(ISO/IEC 9899:TC2 Committee Draft -- May 6, 2005, page 14).
"""
--
Scott Lamb <http://www.slamb.org/>
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13498
; Package
coreutils
.
(Sat, 19 Jan 2013 20:06:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 13498 <at> debbugs.gnu.org (full text, mbox):
Scott Lamb <slamb <at> slamb.org> writes:
> I don't know exactly why the behavior differs based on stdin being a
> tty or not. My best guess is that glibc might have some logic that, if
> stdin is a tty, automatically flushes stdout any time the program
> blocks on stdin.
When a new buffer is read for a line buffered or unbuffered stream,
stdout is flushed. This is traditional Unix behaviour, but AFAIK not
required by any standard.
Andreas.
--
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13498
; Package
coreutils
.
(Sun, 20 Jan 2013 12:43:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 13498 <at> debbugs.gnu.org (full text, mbox):
On 01/19/2013 08:35 AM, Scott Lamb wrote:
> "cut -f" has an apparently long-standing behavior that I'd consider a
> bug: it does not fully send line N to stdout until the first character
> of line N+1 has been read on stdin. This is confusing when stdin comes
> from "tail -f" or the like. The exact behavior varies slightly. If
> stdin is a tty, all but the trailing newline will be flushed
> immediately and then the trailing newline will be flushed when the
> next character shows up. If stdin is not a tty, there's no flush at
> all until the next character shows up.
>
> For example, if I type the following into a shell on Ubuntu 12.04.1,
> meaning cut from coreutils 8.13 and glibc package version
> 2.15-0ubuntu10.3:
>
> cut -f1-
> foo
> bar
> baz
> ^D
>
> I will see the following:
>
> $ cut -f1-
> foo
> foobar
>
> barbaz
>
> baz
> $
>
> and if I instead use "cat | cut -f1-" in the first line, I will see
> the following:
>
> $ cat | cut -f1-
> foo
> bar
> foo
> baz
> bar
> baz
> $
>
> (coreutils's cut -c does not have the same laggy behavior. Neither
> does BSD cut on my OS X machine in either -c or -f mode.)
>
> This code in cut_fields (still found in trunk tip) is responsible for
> delaying the newline; it runs between the newline being read and being
> written:
>
> if (c == '\n')
> {
> c = getc (stream);
> if (c != EOF)
> {
> ungetc (c, stream);
> c = '\n';
> }
> }
>
> I believe that code is there to avoid turning one newline at EOF into
> two, but that goal could be accomplished in another way.
>
> I don't know exactly why the behavior differs based on stdin being a
> tty or not. My best guess is that glibc might have some logic that, if
> stdin is a tty, automatically flushes stdout any time the program
> blocks on stdin. glibc's stdio internals are a bit hard for me to
> follow, so I haven't found the code in question. Apparently this is a
> vaguely standardized behavior; I see a stackoverflow post mentioning
> the following:
>
> """
> The input and output dynamics of interactive devices shall take place
> as specified in 7.19.3. The intent of these requirements is that
> unbuffered or line-buffered output appear as soon as possible, to
> ensure that prompting messages actually appear prior to a program
> waiting for input.
>
> (ISO/IEC 9899:TC2 Committee Draft -- May 6, 2005, page 14).
> """
For my reference:
http://comments.pixelbeat.org/programming/stdio_buffering/#comment-250521
Yes the use of ungetc() is awkward in cut.
I notice that pr is the only other util using ungetc.
Also the i18n version of cut on my system has a rewritten
cut_fields() function that doesn't exhibit the behavior.
ungetc() is coupled with the use of getndelim2(),
but I'll have a look at addressing this.
thanks,
Pádraig.
Reply sent
to
Pádraig Brady <P <at> draigBrady.com>
:
You have taken responsibility.
(Tue, 22 Jan 2013 02:40:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Scott Lamb <slamb <at> slamb.org>
:
bug acknowledged by developer.
(Tue, 22 Jan 2013 02:40:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 13498-done <at> debbugs.gnu.org (full text, mbox):
Proposed patch at: http://lists.gnu.org/archive/html/coreutils/2013-01/msg00076.html
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 19 Feb 2013 12:24:04 GMT)
Full text and
rfc822 format available.
bug unarchived.
Request was from
Eric Blake <eblake <at> redhat.com>
to
control <at> debbugs.gnu.org
.
(Fri, 30 May 2014 01:30:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#13498
; Package
coreutils
.
(Fri, 30 May 2014 01:32:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 13498 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 01/19/2013 01:04 PM, Andreas Schwab wrote:
[revisiting an old bug, since I just noticed it]
> Scott Lamb <slamb <at> slamb.org> writes:
>
>> I don't know exactly why the behavior differs based on stdin being a
>> tty or not. My best guess is that glibc might have some logic that, if
>> stdin is a tty, automatically flushes stdout any time the program
>> blocks on stdin.
>
> When a new buffer is read for a line buffered or unbuffered stream,
> stdout is flushed. This is traditional Unix behaviour, but AFAIK not
> required by any standard.
Actually, POSIX requires it:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05
> When a stream is "unbuffered", bytes are intended to appear from the
> source or at the destination as soon as possible; otherwise, bytes may
> be accumulated and transmitted as a block. When a stream is "fully
> buffered", bytes are intended to be transmitted as a block when a buffer
> is filled. When a stream is "line buffered", bytes are intended to be
> transmitted as a block when a <newline> byte is encountered.
> Furthermore, bytes are intended to be transmitted as a block when a
> buffer is filled, when input is requested on an unbuffered stream, or
> when input is requested on a line-buffered stream that requires the
> transmission of bytes.
stdout is required to be buffered, and when stdin is the same terminal
as stdout, then stdin is line-buffered and it is sufficient that an
input line on stdin forces stdout to be flushed.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[signature.asc (application/pgp-signature, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 27 Jun 2014 11:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 9 years and 305 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.