GNU bug report logs - #22144
--exclude no longer works against arguments with a directory name

Previous Next

Package: grep;

Reported by: Vincent Lefevre <vincent <at> vinc17.net>

Date: Fri, 11 Dec 2015 18:38:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22144 in the body.
You can then email your comments to 22144 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Fri, 11 Dec 2015 18:38:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Vincent Lefevre <vincent <at> vinc17.net>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Fri, 11 Dec 2015 18:38:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: bug-grep <at> gnu.org
Subject: --exclude no longer works against arguments with a directory name
Date: Fri, 11 Dec 2015 10:31:11 +0100
In grep 2.22, --exclude no longer works in some cases:

$ cd /usr/share/doc/grep
$ grep e --exclude README README

is OK, but not:

$ grep e --exclude README /usr/share/doc/grep/README
  Copyright (C) 1992, 1997-2002, 2004-2015 Free Software Foundation, Inc.
[...]

This breaks at least one of my scripts, where --exclude is used to
exclude filenames generated with globbing.

After reverting to grep 2.21, this problem disappeared.

My Debian bug report:
  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=807641

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Fri, 11 Dec 2015 21:38:02 GMT) Full text and rfc822 format available.

Message #8 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>, 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Fri, 11 Dec 2015 13:37:46 -0800
The change in grep 2.22 is due to an earlier bug report:

http://bugs.gnu.org/21027

and was implemented by this patch:

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=c5c70eae261133d71a9436557d998a48aaf0a5fe

Although I can see arguments either way, the grep 2.22 behavior is 
consistent with grep 2.6 and earlier, so in some sense it's 
more-conservative.




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Sat, 12 Dec 2015 01:57:01 GMT) Full text and rfc822 format available.

Message #11 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Sat, 12 Dec 2015 02:56:34 +0100
On 2015-12-11 13:37:46 -0800, Paul Eggert wrote:
> The change in grep 2.22 is due to an earlier bug report:
> 
> http://bugs.gnu.org/21027

This one was about --exclude-dir, whose description in grep 2.21 is
very unclear and it was already broken anyway:

zira:~> grep -rl e --exclude-dir='usr*' /usr/include
zira:~> grep -rl e --exclude-dir='usr*' /usr/include/stdio.h
/usr/include/stdio.h

But for --exclude, the description is clear:

    --exclude=GLOB
        Skip files whose base name matches GLOB (using wildcard
                         ^^^^^^^^^
        matching).  A file-name glob can use *,  ?,  and  [...]
        as  wildcards,  and  \ to quote a wildcard or backslash
        character literally.

"base name" means base name, not the full path!

I suggest that you revert the behavior for --exclude so that existing
scripts are not broken, and possibly add --exclude-path to match the
full path.

Now, --exclude was already broken:

$ grep -l . /usr/share/doc/grep/*
/usr/share/doc/grep/AUTHORS
/usr/share/doc/grep/NEWS.gz
/usr/share/doc/grep/README
/usr/share/doc/grep/THANKS.gz
/usr/share/doc/grep/TODO.gz
/usr/share/doc/grep/changelog.Debian.gz
/usr/share/doc/grep/changelog.gz
/usr/share/doc/grep/copyright

$ grep -l --exclude='*e*' . /usr/share/doc/grep/*

outputs nothing, while one should get:

/usr/share/doc/grep/AUTHORS
/usr/share/doc/grep/NEWS.gz
/usr/share/doc/grep/README
/usr/share/doc/grep/THANKS.gz
/usr/share/doc/grep/TODO.gz
/usr/share/doc/grep/copyright

excluding files whose base name contains a letter "e".

And with grep 2.22, this is still inconsistent:

ypig:~> grep -l --exclude=AUTHORS . /usr/share/doc/grep/*
/usr/share/doc/grep/AUTHORS
/usr/share/doc/grep/NEWS.gz
/usr/share/doc/grep/README
/usr/share/doc/grep/THANKS.gz
/usr/share/doc/grep/TODO.gz
/usr/share/doc/grep/changelog.Debian.gz
/usr/share/doc/grep/changelog.gz
/usr/share/doc/grep/copyright

ypig:~> grep -rl --exclude=AUTHORS . /usr/share/doc/grep
/usr/share/doc/grep/TODO.gz
/usr/share/doc/grep/changelog.gz
/usr/share/doc/grep/changelog.Debian.gz
/usr/share/doc/grep/THANKS.gz
/usr/share/doc/grep/NEWS.gz
/usr/share/doc/grep/copyright
/usr/share/doc/grep/README

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Sat, 12 Dec 2015 03:32:02 GMT) Full text and rfc822 format available.

Message #14 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Fri, 11 Dec 2015 19:31:46 -0800
On 12/11/2015 05:56 PM, Vincent Lefevre wrote:
> or --exclude, the description is clear:

The description changed in grep 2.22, to match the 2.22 (also, 
2.6-and-earlier) behavior.

As you say, the 2.22 behavior does not seem ideal. However, the 2.7 
through 2.21 behavior wasn't ideal either. It's not clear which behavior 
is better overall, nor is it clear whether we could unify parts of the 
two behaviors to get the best of both worlds. I'd rather not add yet 
another option in this area, if that can be avoided.




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Tue, 15 Dec 2015 09:00:02 GMT) Full text and rfc822 format available.

Message #17 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Tue, 15 Dec 2015 09:59:43 +0100
On 2015-12-11 19:31:46 -0800, Paul Eggert wrote:
> On 12/11/2015 05:56 PM, Vincent Lefevre wrote:
> >or --exclude, the description is clear:
> 
> The description changed in grep 2.22, to match the 2.22 (also,
> 2.6-and-earlier) behavior.

My quote was from the grep 2.22 description (grep 2.22-1 Debian
package). It seems that this has changed later:

  http://www.gnu.org/software/grep/manual/grep.html

which is different:

  Skip files whose name matches the pattern glob, using wildcard
  matching. When searching recursively, skip any subfile whose base
  name matches glob; the base name is the part after the last ‘/’.
  A pattern can use ‘*’, ‘?’, and ‘[’...‘]’ as wildcards, and \ to
  quote a wildcard or backslash character literally.

The documentation is still ambiguous. For the "main case", is this
the canonical name as returned by realpath?

> As you say, the 2.22 behavior does not seem ideal.

By doing a difference for subfiles of a recursive search, this is
even worse!

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Tue, 15 Dec 2015 23:28:01 GMT) Full text and rfc822 format available.

Message #20 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Tue, 15 Dec 2015 15:27:27 -0800
Vincent Lefevre wrote:
> For the "main case", is this
> the canonical name as returned by realpath?

I don't see why.  grep doesn't need to compute anything's realpath.




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Wed, 16 Dec 2015 01:01:02 GMT) Full text and rfc822 format available.

Message #23 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Wed, 16 Dec 2015 02:00:18 +0100
On 2015-12-15 15:27:27 -0800, Paul Eggert wrote:
> Vincent Lefevre wrote:
> >For the "main case", is this
> >the canonical name as returned by realpath?
> 
> I don't see why.  grep doesn't need to compute anything's realpath.

How is the file name defined, then?

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Wed, 16 Dec 2015 06:25:01 GMT) Full text and rfc822 format available.

Message #26 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Tue, 15 Dec 2015 22:24:25 -0800
Vincent Lefevre wrote:
> How is the file name defined, then?

It's built as a string, which is passed to 'open' without worrying about realpath.

By the way in case it's not already clear, I agree with you that the current 
behavior is not good, it's just that we can't simply revert (as that was also 
not good), we need to make it better.




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Wed, 16 Dec 2015 10:27:02 GMT) Full text and rfc822 format available.

Message #29 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Wed, 16 Dec 2015 11:26:43 +0100
On 2015-12-15 22:24:25 -0800, Paul Eggert wrote:
> Vincent Lefevre wrote:
> >How is the file name defined, then?
> 
> It's built as a string, which is passed to 'open' without worrying
> about realpath.

So, for instance, "foo" and "./foo" are regarded as different?

The way how files are regarded to be the same needs to be clarified.
For instance, some utilities consider the device & inode numbers
(which have their own problems with broken FS implementations).
That's the case of cp (if this has not changed since 2004). That's
also the case of diff (if this has not changed since 2005), but you
know that. :)

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Wed, 16 Dec 2015 23:13:01 GMT) Full text and rfc822 format available.

Message #32 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Wed, 16 Dec 2015 15:12:17 -0800
Vincent Lefevre wrote:
> So, for instance, "foo" and "./foo" are regarded as different?

Yes. Grep does not need to worry about inodes or realpath or anything like that, 
so it doesn't.




Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Mon, 28 Dec 2015 09:10:02 GMT) Full text and rfc822 format available.

Message #35 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Mon, 28 Dec 2015 01:09:40 -0800
[Message part 1 (text/plain, inline)]
Vincent Lefevre wrote:
> The documentation is still ambiguous. For the "main case", is this
> the canonical name as returned by realpath?
>
>> >As you say, the 2.22 behavior does not seem ideal.
> By doing a difference for subfiles of a recursive search, this is
> even worse!

Please try the attached patch, which I've installed into the savannah 
repository. It attempts to fix the behavior, and to clarify the documentation. 
It is a tricky area. Hope this helps.
[0001-grep-exclude-matches-trailing-parts-of-args.patch (text/x-diff, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#22144; Package grep. (Wed, 30 Dec 2015 14:25:02 GMT) Full text and rfc822 format available.

Message #38 received at 22144 <at> debbugs.gnu.org (full text, mbox):

From: Vincent Lefevre <vincent <at> vinc17.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 22144 <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Wed, 30 Dec 2015 15:24:27 +0100
On 2015-12-28 01:09:40 -0800, Paul Eggert wrote:
> Vincent Lefevre wrote:
> > The documentation is still ambiguous. For the "main case", is this
> > the canonical name as returned by realpath?
> > 
> > > >As you say, the 2.22 behavior does not seem ideal.
> > By doing a difference for subfiles of a recursive search, this is
> > even worse!
> 
> Please try the attached patch, which I've installed into the savannah
> repository. It attempts to fix the behavior, and to clarify the
> documentation. It is a tricky area. Hope this helps.

I've done various tests, and it seems fine. Thanks.

-- 
Vincent Lefèvre <vincent <at> vinc17.net> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Thu, 31 Dec 2015 06:38:02 GMT) Full text and rfc822 format available.

Notification sent to Vincent Lefevre <vincent <at> vinc17.net>:
bug acknowledged by developer. (Thu, 31 Dec 2015 06:38:02 GMT) Full text and rfc822 format available.

Message #43 received at 22144-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Vincent Lefevre <vincent <at> vinc17.net>
Cc: 22144-done <at> debbugs.gnu.org
Subject: Re: bug#22144: --exclude no longer works against arguments with a
 directory name
Date: Wed, 30 Dec 2015 22:37:16 -0800
Vincent Lefevre wrote:
> I've done various tests, and it seems fine. Thanks.

You're welcome; closing the bug report.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 28 Jan 2016 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 8 years and 111 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.