GNU bug report logs - #14327
sort: random hangs executing coreutils 8.21

Previous Next

Package: coreutils;

Reported by: Kevin Wills <kevinmwills <at> hotmail.com>

Date: Wed, 1 May 2013 15:47:01 UTC

Severity: normal

Tags: moreinfo

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 14327 in the body.
You can then email your comments to 14327 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#14327; Package coreutils. (Wed, 01 May 2013 15:47:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kevin Wills <kevinmwills <at> hotmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Wed, 01 May 2013 15:47:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Kevin Wills <kevinmwills <at> hotmail.com>
To: "bug-coreutils <at> gnu.org" <bug-coreutils <at> gnu.org>
Subject: I am getting random hangs executing coreutils 8.21 sort.
Date: Wed, 1 May 2013 09:02:50 -0500
I am getting random hangs executing coreutils 8.21 sort.
It occurs every so often but I can't reproduce on demand.
 
OS: RHEL 5.8 Linux 2.6.18-308.4.1.el5 #1 SMP Wed Mar 28 01:54:56 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
 
CMDLINE: INSTALL/coreutils-8.21/bin/sort --parallel=4 -k3,3 -k4,4n -t , -S 768M -T /tmp
 
-rw------- 1 x x 410M Apr 30 17:31 /tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh
INFO THREADS

 2 Thread 0x43386940 (LWP 28946) 0x0000003675c0d594 in __lll_lock_wait () from /lib64/libpthread.so.0
* 1 Thread 0x2af45f251e20 (LWP 28724) 0x0000003675c07ba5 in pthread_join () from /lib64/libpthread.so.0
 
Each of the hangs is at this same place.
 
(gdb) info threads

  2 Thread 0x43386940 (LWP 28946)  0x0000003675c0d594 in __lll_lock_wait () from /lib64/libpthread.so.0
* 1 Thread 0x2af45f251e20 (LWP 28724)  0x0000003675c07ba5 in pthread_join () from /lib64/libpthread.so.0

THREAD #1

(gdb) bt 20
#0  0x0000003675c07ba5 in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000406d92 in sortlines (lines=0x2af4898c1b50, nthreads=1, total_lines=3913269, node=0x1dbf37b0, queue=0x7fffb16a4b80, tfp=0x1dbf32a0,
    temp_output=0x1dbf323d "/tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh") at src/sort.c:3587
#2  0x0000000000406d83 in sortlines (lines=0x2af48b69ccf0, nthreads=2, total_lines=3913269, node=0x1dbf35b0, queue=0x7fffb16a4b80, tfp=0x1dbf32a0,
    temp_output=0x1dbf323d "/tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh") at src/sort.c:3585
#3  0x0000000000407191 in sort (files=0x6185b8, nfiles=0, output_file=0x0, nthreads=4) at src/sort.c:3917
#4  0x00000000004093a6 in main (argc=10, argv=0x7fffb16a5238) at src/sort.c:4702


THREAD #2

(gdb) thread 2
[Switching to thread 2 (Thread 0x43386940 (LWP 28946))]#0  0x0000003675c0d594 in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt 20
#0  0x0000003675c0d594 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000003675c08e8a in _L_lock_1034 () from /lib64/libpthread.so.0
#2  0x0000003675c08d4c in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000406a39 in lock_node (lines=<value optimized out>, nthreads=<value optimized out>, total_lines=3913269, node=0x1dbf3830, queue=0x7fffb16a4b80,
    tfp=0x1dbf32a0, temp_output=0x1dbf323d "/tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh") at src/sort.c:3280
#4  queue_check_insert_parent (lines=<value optimized out>, nthreads=<value optimized out>, total_lines=3913269, node=0x1dbf3830, queue=0x7fffb16a4b80, tfp=0x1dbf32a0,
    temp_output=0x1dbf323d "/tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh") at src/sort.c:3458
#5  merge_loop (lines=<value optimized out>, nthreads=<value optimized out>, total_lines=3913269, node=0x1dbf3830, queue=0x7fffb16a4b80, tfp=0x1dbf32a0,
    temp_output=0x1dbf323d "/tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh") at src/sort.c:3493
#6  sortlines (lines=<value optimized out>, nthreads=<value optimized out>, total_lines=3913269, node=0x1dbf3830, queue=0x7fffb16a4b80, tfp=0x1dbf32a0,
    temp_output=0x1dbf323d "/tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh") at src/sort.c:3608
#7  0x0000000000406dc8 in sortlines_thread (data=<value optimized out>) at src/sort.c:3538
#8  0x0000003675c0677d in start_thread () from /lib64/libpthread.so.0
#9  0x0000003674cd325d in clone () from /lib64/libc.so.6 		 	   		  



Information forwarded to bug-coreutils <at> gnu.org:
bug#14327; Package coreutils. (Wed, 01 May 2013 20:23:01 GMT) Full text and rfc822 format available.

Message #8 received at 14327 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Kevin Wills <kevinmwills <at> hotmail.com>
Cc: 14327 <at> debbugs.gnu.org
Subject: Re: bug#14327: I am getting random hangs executing coreutils 8.21
	sort.
Date: Wed, 01 May 2013 13:21:46 -0700
Can you build coreutils without optimization
(make CFLAGS='-g3') and get a backtrace from
that version?  That might be easier to diagnose.




Information forwarded to bug-coreutils <at> gnu.org:
bug#14327; Package coreutils. (Fri, 03 May 2013 08:10:02 GMT) Full text and rfc822 format available.

Message #11 received at 14327 <at> debbugs.gnu.org (full text, mbox):

From: Chen Guo <chen.guo.0625 <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 14327 <at> debbugs.gnu.org, Kevin Wills <kevinmwills <at> hotmail.com>
Subject: Re: bug#14327: I am getting random hangs executing coreutils 8.21
	sort.
Date: Fri, 3 May 2013 01:09:09 -0700
[Message part 1 (text/plain, inline)]
Hi Kevin,if it's not too much trouble a core dump would also be nice to
have, particularly given how difficult this seems to be to reproduce.


On Wed, May 1, 2013 at 1:21 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> Can you build coreutils without optimization
> (make CFLAGS='-g3') and get a backtrace from
> that version?  That might be easier to diagnose.
>
>
>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#14327; Package coreutils. (Fri, 03 May 2013 08:31:02 GMT) Full text and rfc822 format available.

Message #14 received at 14327 <at> debbugs.gnu.org (full text, mbox):

From: Chen Guo <chen.guo.0625 <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 14327 <at> debbugs.gnu.org, Kevin Wills <kevinmwills <at> hotmail.com>
Subject: Re: bug#14327: I am getting random hangs executing coreutils 8.21
	sort.
Date: Fri, 3 May 2013 01:29:24 -0700
[Message part 1 (text/plain, inline)]
On second though let's hold off on the core dump, I realize with how many
lines you're sorting the dump must be excessively large. Let's hope we can
nail it with a proper stack trace first.


On Fri, May 3, 2013 at 1:09 AM, Chen Guo <chen.guo.0625 <at> gmail.com> wrote:

> Hi Kevin,if it's not too much trouble a core dump would also be nice to
> have, particularly given how difficult this seems to be to reproduce.
>
>
> On Wed, May 1, 2013 at 1:21 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
>> Can you build coreutils without optimization
>> (make CFLAGS='-g3') and get a backtrace from
>> that version?  That might be easier to diagnose.
>>
>>
>>
>>
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#14327; Package coreutils. (Fri, 03 May 2013 17:06:01 GMT) Full text and rfc822 format available.

Message #17 received at 14327 <at> debbugs.gnu.org (full text, mbox):

From: Bob Proulx <bob <at> proulx.com>
To: Chen Guo <chen.guo.0625 <at> gmail.com>
Cc: 14327 <at> debbugs.gnu.org, Kevin Wills <kevinmwills <at> hotmail.com>
Subject: Re: bug#14327: I am getting random hangs executing coreutils 8.21
	sort.
Date: Fri, 3 May 2013 11:05:15 -0600
Chen Guo wrote:
> Chen Guo wrote:
> > Hi Kevin,if it's not too much trouble a core dump would also be nice to
> > have, particularly given how difficult this seems to be to reproduce.
>
> On second though let's hold off on the core dump, I realize with how many
> lines you're sorting the dump must be excessively large. Let's hope we can
> nail it with a proper stack trace first.

If you do try to share core file please do not send it to the mailing
list.  Any core file would be too huge to push to everyone and only a
few would find it useful.  Raw core dumps tend to be problematic
unless matched with their exact matching source.  Let's not push large
binary files to everyone.

Instead post it to a paste bin site such as wikisend or other and then
just send the link to the mailing list.  No login needed.

  http://wikisend.com/

Bob






Information forwarded to bug-coreutils <at> gnu.org:
bug#14327; Package coreutils. (Sun, 05 May 2013 06:09:01 GMT) Full text and rfc822 format available.

Message #20 received at 14327 <at> debbugs.gnu.org (full text, mbox):

From: Chen Guo <chen.guo.0625 <at> gmail.com>
To: Kevin Wills <kevinmwills <at> hotmail.com>
Cc: 14327 <at> debbugs.gnu.org
Subject: Re: bug#14327: I am getting random hangs executing coreutils 8.21
	sort.
Date: Sat, 4 May 2013 23:08:02 -0700
Hi Kevin,
On Wed, May 1, 2013 at 7:02 AM, Kevin Wills <kevinmwills <at> hotmail.com> wrote:
> THREAD #1
>
> (gdb) bt 20
> #0  0x0000003675c07ba5 in pthread_join () from /lib64/libpthread.so.0
> #1  0x0000000000406d92 in sortlines (lines=0x2af4898c1b50, nthreads=1, total_lines=3913269, node=0x1dbf37b0, queue=0x7fffb16a4b80, tfp=0x1dbf32a0,
>     temp_output=0x1dbf323d "/tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh") at src/sort.c:3587
> #2  0x0000000000406d83 in sortlines (lines=0x2af48b69ccf0, nthreads=2, total_lines=3913269, node=0x1dbf35b0, queue=0x7fffb16a4b80, tfp=0x1dbf32a0,
>     temp_output=0x1dbf323d "/tmp/TEST.20100129.1367347586.5d6b2b011bb90cc1/sortrg8yHh") at src/sort.c:3585
> #3  0x0000000000407191 in sort (files=0x6185b8, nfiles=0, output_file=0x0, nthreads=4) at src/sort.c:3917
> #4  0x00000000004093a6 in main (argc=10, argv=0x7fffb16a5238) at src/sort.c:4702
>

Took a quick look, noticed the above call stack is suspicious. The last call
to sortlines before the pthread_join() is invoked with nthreads = 1, while
in code sortlines would only call pthread_join() if nthreads > 1.

I confirmed this in the disassembly as well to rule out the unlikely
possibility this was the result of some compiler optimization (I used -O2).

Did you compile this yourself or was it distributed with your system?




Information forwarded to bug-coreutils <at> gnu.org:
bug#14327; Package coreutils. (Fri, 19 Oct 2018 00:31:02 GMT) Full text and rfc822 format available.

Message #23 received at 14327 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
Cc: 14327 <at> debbugs.gnu.org
Subject: Re: bug#14327: I am getting random hangs executing coreutils 8.21
 sort.
Date: Thu, 18 Oct 2018 18:30:05 -0600
tags 14327 moreinfo
retitle 14327 sort: random hangs executing coreutils 8.21
close 14327
stop

(triaging old bugs)

Hello,

On 05/05/13 12:08 AM, Chen Guo wrote:
[...]
> I confirmed this in the disassembly as well to rule out the unlikely
> possibility this was the result of some compiler optimization (I used -O2).
> 
> Did you compile this yourself or was it distributed with your system?

With no further follow-ups in 5 years,
and some multithreaded-related bug fixes in sort version 8.23,
I'm closing this bug.

If there are new sort-related bugs, please write to bugs-coreutils <at> gnu.org .

regards,
 - assaf






Added tag(s) moreinfo. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 19 Oct 2018 00:31:02 GMT) Full text and rfc822 format available.

Changed bug title to 'sort: random hangs executing coreutils 8.21' from 'I am getting random hangs executing coreutils 8.21 sort.' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 19 Oct 2018 00:31:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 14327 <at> debbugs.gnu.org and Kevin Wills <kevinmwills <at> hotmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 19 Oct 2018 00:31:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 16 Nov 2018 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 156 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.