GNU bug report logs - #13498
"cut -f" lags a line

Previous Next

Package: coreutils;

Reported by: Scott Lamb <slamb <at> slamb.org>

Date: Sat, 19 Jan 2013 17:27:01 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13498 in the body.
You can then email your comments to 13498 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#13498; Package coreutils. (Sat, 19 Jan 2013 17:27:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Scott Lamb <slamb <at> slamb.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 19 Jan 2013 17:27:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Scott Lamb <slamb <at> slamb.org>
To: bug-coreutils <at> gnu.org
Subject: "cut -f" lags a line
Date: Sat, 19 Jan 2013 00:35:18 -0800
"cut -f" has an apparently long-standing behavior that I'd consider a
bug: it does not fully send line N to stdout until the first character
of line N+1 has been read on stdin. This is confusing when stdin comes
from "tail -f" or the like. The exact behavior varies slightly. If
stdin is a tty, all but the trailing newline will be flushed
immediately and then the trailing newline will be flushed when the
next character shows up. If stdin is not a tty, there's no flush at
all until the next character shows up.

For example, if I type the following into a shell on Ubuntu 12.04.1,
meaning cut from coreutils 8.13 and glibc package version
2.15-0ubuntu10.3:

    cut -f1-
    foo
    bar
    baz
    ^D

I will see the following:

    $ cut -f1-
    foo
    foobar

    barbaz

    baz
    $

and if I instead use "cat | cut -f1-" in the first line, I will see
the following:

    $ cat | cut -f1-
    foo
    bar
    foo
    baz
    bar
    baz
    $

(coreutils's cut -c does not have the same laggy behavior. Neither
does BSD cut on my OS X machine in either -c or -f mode.)

This code in cut_fields (still found in trunk tip) is responsible for
delaying the newline; it runs between the newline being read and being
written:

      if (c == '\n')
        {
          c = getc (stream);
          if (c != EOF)
            {
              ungetc (c, stream);
              c = '\n';
            }
        }

I believe that code is there to avoid turning one newline at EOF into
two, but that goal could be accomplished in another way.

I don't know exactly why the behavior differs based on stdin being a
tty or not. My best guess is that glibc might have some logic that, if
stdin is a tty, automatically flushes stdout any time the program
blocks on stdin. glibc's stdio internals are a bit hard for me to
follow, so I haven't found the code in question. Apparently this is a
vaguely standardized behavior; I see a stackoverflow post mentioning
the following:

"""
The input and output dynamics of interactive devices shall take place
as specified in 7.19.3. The intent of these requirements is that
unbuffered or line-buffered output appear as soon as possible, to
ensure that prompting messages actually appear prior to a program
waiting for input.

(ISO/IEC 9899:TC2 Committee Draft -- May 6, 2005, page 14).
"""

--
Scott Lamb <http://www.slamb.org/>




Information forwarded to bug-coreutils <at> gnu.org:
bug#13498; Package coreutils. (Sat, 19 Jan 2013 20:06:01 GMT) Full text and rfc822 format available.

Message #8 received at 13498 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Scott Lamb <slamb <at> slamb.org>
Cc: 13498 <at> debbugs.gnu.org
Subject: Re: bug#13498: "cut -f" lags a line
Date: Sat, 19 Jan 2013 21:04:59 +0100
Scott Lamb <slamb <at> slamb.org> writes:

> I don't know exactly why the behavior differs based on stdin being a
> tty or not. My best guess is that glibc might have some logic that, if
> stdin is a tty, automatically flushes stdout any time the program
> blocks on stdin.

When a new buffer is read for a line buffered or unbuffered stream,
stdout is flushed.  This is traditional Unix behaviour, but AFAIK not
required by any standard.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




Information forwarded to bug-coreutils <at> gnu.org:
bug#13498; Package coreutils. (Sun, 20 Jan 2013 12:43:02 GMT) Full text and rfc822 format available.

Message #11 received at 13498 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Scott Lamb <slamb <at> slamb.org>
Cc: 13498 <at> debbugs.gnu.org
Subject: Re: bug#13498: "cut -f" lags a line
Date: Sun, 20 Jan 2013 12:41:12 +0000
On 01/19/2013 08:35 AM, Scott Lamb wrote:
> "cut -f" has an apparently long-standing behavior that I'd consider a
> bug: it does not fully send line N to stdout until the first character
> of line N+1 has been read on stdin. This is confusing when stdin comes
> from "tail -f" or the like. The exact behavior varies slightly. If
> stdin is a tty, all but the trailing newline will be flushed
> immediately and then the trailing newline will be flushed when the
> next character shows up. If stdin is not a tty, there's no flush at
> all until the next character shows up.
>
> For example, if I type the following into a shell on Ubuntu 12.04.1,
> meaning cut from coreutils 8.13 and glibc package version
> 2.15-0ubuntu10.3:
>
>      cut -f1-
>      foo
>      bar
>      baz
>      ^D
>
> I will see the following:
>
>      $ cut -f1-
>      foo
>      foobar
>
>      barbaz
>
>      baz
>      $
>
> and if I instead use "cat | cut -f1-" in the first line, I will see
> the following:
>
>      $ cat | cut -f1-
>      foo
>      bar
>      foo
>      baz
>      bar
>      baz
>      $
>
> (coreutils's cut -c does not have the same laggy behavior. Neither
> does BSD cut on my OS X machine in either -c or -f mode.)
>
> This code in cut_fields (still found in trunk tip) is responsible for
> delaying the newline; it runs between the newline being read and being
> written:
>
>        if (c == '\n')
>          {
>            c = getc (stream);
>            if (c != EOF)
>              {
>                ungetc (c, stream);
>                c = '\n';
>              }
>          }
>
> I believe that code is there to avoid turning one newline at EOF into
> two, but that goal could be accomplished in another way.
>
> I don't know exactly why the behavior differs based on stdin being a
> tty or not. My best guess is that glibc might have some logic that, if
> stdin is a tty, automatically flushes stdout any time the program
> blocks on stdin. glibc's stdio internals are a bit hard for me to
> follow, so I haven't found the code in question. Apparently this is a
> vaguely standardized behavior; I see a stackoverflow post mentioning
> the following:
>
> """
> The input and output dynamics of interactive devices shall take place
> as specified in 7.19.3. The intent of these requirements is that
> unbuffered or line-buffered output appear as soon as possible, to
> ensure that prompting messages actually appear prior to a program
> waiting for input.
>
> (ISO/IEC 9899:TC2 Committee Draft -- May 6, 2005, page 14).
> """

For my reference:
http://comments.pixelbeat.org/programming/stdio_buffering/#comment-250521

Yes the use of ungetc() is awkward in cut.
I notice that pr is the only other util using ungetc.
Also the i18n version of cut on my system has a rewritten
cut_fields() function that doesn't exhibit the behavior.

ungetc() is coupled with the use of getndelim2(),
but I'll have a look at addressing this.

thanks,
Pádraig.




Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Tue, 22 Jan 2013 02:40:01 GMT) Full text and rfc822 format available.

Notification sent to Scott Lamb <slamb <at> slamb.org>:
bug acknowledged by developer. (Tue, 22 Jan 2013 02:40:01 GMT) Full text and rfc822 format available.

Message #16 received at 13498-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Scott Lamb <slamb <at> slamb.org>
Cc: 13498-done <at> debbugs.gnu.org
Subject: Re: bug#13498: "cut -f" lags a line
Date: Tue, 22 Jan 2013 02:37:54 +0000
Proposed patch at: http://lists.gnu.org/archive/html/coreutils/2013-01/msg00076.html




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 19 Feb 2013 12:24:04 GMT) Full text and rfc822 format available.

bug unarchived. Request was from Eric Blake <eblake <at> redhat.com> to control <at> debbugs.gnu.org. (Fri, 30 May 2014 01:30:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#13498; Package coreutils. (Fri, 30 May 2014 01:32:02 GMT) Full text and rfc822 format available.

Message #23 received at 13498 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
Cc: 13498 <at> debbugs.gnu.org
Subject: Re: bug#13498: "cut -f" lags a line
Date: Thu, 29 May 2014 19:31:31 -0600
[Message part 1 (text/plain, inline)]
On 01/19/2013 01:04 PM, Andreas Schwab wrote:

[revisiting an old bug, since I just noticed it]

> Scott Lamb <slamb <at> slamb.org> writes:
> 
>> I don't know exactly why the behavior differs based on stdin being a
>> tty or not. My best guess is that glibc might have some logic that, if
>> stdin is a tty, automatically flushes stdout any time the program
>> blocks on stdin.
> 
> When a new buffer is read for a line buffered or unbuffered stream,
> stdout is flushed.  This is traditional Unix behaviour, but AFAIK not
> required by any standard.

Actually, POSIX requires it:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_05

> When a stream is "unbuffered", bytes are intended to appear from the
> source or at the destination as soon as possible; otherwise, bytes may
> be accumulated and transmitted as a block. When a stream is "fully
> buffered", bytes are intended to be transmitted as a block when a buffer
> is filled. When a stream is "line buffered", bytes are intended to be
> transmitted as a block when a <newline> byte is encountered.
> Furthermore, bytes are intended to be transmitted as a block when a
> buffer is filled, when input is requested on an unbuffered stream, or
> when input is requested on a line-buffered stream that requires the
> transmission of bytes.

stdout is required to be buffered, and when stdin is the same terminal
as stdout, then stdin is line-buffered and it is sufficient that an
input line on stdin forces stdout to be flushed.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org



[signature.asc (application/pgp-signature, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 27 Jun 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 305 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.