GNU bug report logs - #17147
Performance regression by 3000000% for evaluating "and" form

Package: guile;

Reported by: David Kastrup <dak <at> gnu.org>

Date: Mon, 31 Mar 2014 09:59:02 UTC

Severity: wishlist

Done: Mark H Weaver <mhw <at> netris.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 17147 in the body.
You can then email your comments to 17147 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Mon, 31 Mar 2014 09:59:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kastrup <dak <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Mon, 31 Mar 2014 09:59:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: bug-guile <at> gnu.org
Subject: Performance regression by 3000000% for evaluating "and" form
Date: Mon, 31 Mar 2014 11:58:00 +0200

[Message part 1 (text/plain, inline)]

The following program goes from 2.3ms execution time to 80s execution
time when going from Guile-1.8 to current master.

[zorpo.scm (text/plain, inline)]

(use-modules (srfi srfi-19))
(define start-time (current-time))
(primitive-eval (cons 'and (make-list 10000 #f)))
(display (time-difference (current-time) start-time))

[Message part 3 (text/plain, inline)]


With Guile-2.0.9 (Ubuntu package), it crashes with
Backtrace:
In ice-9/psyntax.scm:
2683: 19 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 18 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 17 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 16 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 15 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 14 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 13 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 12 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 11 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 10 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 9 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 8 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 7 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 6 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 5 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 4 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 3 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 2 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 1 [match-each-any (#f #f #f ...) ((#f top) shift) ...]
2683: 0 [match-each-any (#f #f #f ...) ((#f top) shift) ...]

ice-9/psyntax.scm:2683:37: In procedure match-each-any:
ice-9/psyntax.scm:2683:37: Throw to key `vm-error' with args `(vm-run "VM: Stack overflow" ())'.

which might give a clue as to where Guile-v2 is spending all its time.

-- 
David Kastrup

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Mon, 31 Mar 2014 22:32:02 GMT) Full text and rfc822 format available.

Message #8 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 17147 <at> debbugs.gnu.org, request <at> debbugs.gnu.org
Subject: Re: bug#17147: Performance regression by 3000000% for evaluating
 "and" form
Date: Mon, 31 Mar 2014 18:30:45 -0400

severity 17147 wishlist
thanks

David Kastrup <dak <at> gnu.org> writes:

> The following program goes from 2.3ms execution time to 80s execution
> time when going from Guile-1.8 to current master.
>
> (use-modules (srfi srfi-19))
> (define start-time (current-time))
> (primitive-eval (cons 'and (make-list 10000 #f)))
> (display (time-difference (current-time) start-time))

I suspect the reason this works well on Guile 1.8 is that macros are
expanded lazily, and since the first argument is #f, it doesn't need to
expand the rest.

Guile 2.0 uses a different macro expander which is vastly superior in
most respects and needed to support modern Scheme programs, but it is
not lazy.  Guile 2 is primarily designed to work in a compiled mode
anyway, where laziness is pointless and would most likely slow things
down.

(and x y ...) expands into (if x (and y ...) #f), so your huge 'and'
form turns into a very deeply nested expression, and this overflows the
compiler's stack on Guile 2.0.x.  Thanks to Andy's recent work on
expandable stacks in master, this case works properly there.

While it would of course be ideal for our compiler to efficiently handle
expressions 10000 levels deep (if it can be done without sacrificing
maintainability), dealing with such pathological cases should not be a
high priority, IMO.  There are many more important things to work on.

Is this just an academic exercise, or does Lilypond generate massively
huge expressions like this?

      Mark

Severity set to 'wishlist' from 'normal' Request was from Mark H Weaver <mhw <at> netris.org> to control <at> debbugs.gnu.org. (Mon, 31 Mar 2014 22:32:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Mon, 31 Mar 2014 23:22:02 GMT) Full text and rfc822 format available.

Message #13 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 17147 <at> debbugs.gnu.org, request <at> debbugs.gnu.org
Subject: Re: bug#17147: Performance regression by 3000000% for evaluating
 "and" form
Date: Tue, 01 Apr 2014 01:21:15 +0200

[Message part 1 (text/plain, inline)]

Mark H Weaver <mhw <at> netris.org> writes:

> severity 17147 wishlist
> thanks
>
> David Kastrup <dak <at> gnu.org> writes:
>
>> The following program goes from 2.3ms execution time to 80s execution
>> time when going from Guile-1.8 to current master.
>>
>> (use-modules (srfi srfi-19))
>> (define start-time (current-time))
>> (primitive-eval (cons 'and (make-list 10000 #f)))
>> (display (time-difference (current-time) start-time))
>
> I suspect the reason this works well on Guile 1.8 is that macros are
> expanded lazily, and since the first argument is #f, it doesn't need to
> expand the rest.
>
> Guile 2.0 uses a different macro expander which is vastly superior in
> most respects and needed to support modern Scheme programs, but it is
> not lazy.  Guile 2 is primarily designed to work in a compiled mode
> anyway, where laziness is pointless and would most likely slow things
> down.
>
> (and x y ...) expands into (if x (and y ...) #f), so your huge 'and'
> form turns into a very deeply nested expression, and this overflows the
> compiler's stack on Guile 2.0.x.  Thanks to Andy's recent work on
> expandable stacks in master, this case works properly there.

I think you are overlooking here that a mere 10000-term expression here
is taking 80 seconds to complete.  That's 8ms for each term
corresponding to several million clock cycles.  The only way to arrive
at such a performance impact is by at least quadratic behavior, and
quadratic behavior is a bad idea.  As a point of reference, the
equivalent and just as deeply nested

[zorpo.scm (text/plain, inline)]

(use-modules (srfi srfi-19) (srfi srfi-1))
(define start-time (current-time))
(primitive-eval (fold (lambda (a n) (list 'if #f n)) #f (make-list 10000)))
(display (time-difference (current-time) start-time))

[Message part 3 (text/plain, inline)]

takes 100ms.

> While it would of course be ideal for our compiler to efficiently
> handle expressions 10000 levels deep (if it can be done without
> sacrificing maintainability), dealing with such pathological cases
> should not be a high priority, IMO.  There are many more important
> things to work on.
>
> Is this just an academic exercise, or does Lilypond generate massively
> huge expressions like this?

LilyPond evaluates a lot of one-shot expressions in the course of its
operation, including complex ones.  But Guilev2 offers enough other
stumbling blocks.  We have not yet arrived at a state where it would
even be possible to evaluate performance.

I tripped over this problem here merely while trying to find a
persuasive example for benefits of improving scm_reverse_x performance,
as scm_reverse_x is used quite a bit in libguile/expand.c.

Since the performance impact of reverse! in this example is
indistinguishable from thermal noise, its use seems to be outside of the
quadratically behaving code parts.

-- 
David Kastrup

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Tue, 01 Apr 2014 02:57:02 GMT) Full text and rfc822 format available.

Message #16 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 17147 <at> debbugs.gnu.org
Subject: Re: bug#17147: Performance regression by 3000000% for evaluating
 "and" form
Date: Mon, 31 Mar 2014 22:55:00 -0400

David Kastrup <dak <at> gnu.org> writes:

>> (and x y ...) expands into (if x (and y ...) #f), so your huge 'and'
>> form turns into a very deeply nested expression, and this overflows the
>> compiler's stack on Guile 2.0.x.  Thanks to Andy's recent work on
>> expandable stacks in master, this case works properly there.
>
> I think you are overlooking here that a mere 10000-term expression here
> is taking 80 seconds to complete.  That's 8ms for each term
> corresponding to several million clock cycles.  The only way to arrive
> at such a performance impact is by at least quadratic behavior, and
> quadratic behavior is a bad idea.

Okay, good point.  Indeed, the expansion time of our 'and' and 'or'
macros is quadratic in the number of operands.  They are implemented in
boot-9.scm as follows:

  (define-syntax and
    (syntax-rules ()
      ((_) #t)
      ((_ x) x)
      ((_ x y ...) (if x (and y ...) #f))))

  (define-syntax or
    (syntax-rules ()
      ((_) #f)
      ((_ x) x)
      ((_ x y ...) (let ((t x)) (if t t (or y ...))))))

The problem is that the "y ..." pattern has to iterate down the entire
list to verify that it's a proper list, and this is done for each
operand.

The simplest solution would be to replace all occurrences of "y ..."
with ". y" in the two macros above, but that has the slight downside of
making the error message much less comprehensible if you use a dotted
tail in an 'and' or 'or' form.  Maybe that's not worth worrying about
though.

Alternatively, we could use procedural macros here, but we'd have to
limit ourselves to very primitive forms in the code because this is so
early in the bootstrap.

     Mark

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Tue, 01 Apr 2014 06:18:01 GMT) Full text and rfc822 format available.

Message #19 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 17147 <at> debbugs.gnu.org
Subject: Re: bug#17147: Performance regression by 3000000% for evaluating
 "and" form
Date: Tue, 01 Apr 2014 08:17:01 +0200

[Message part 1 (text/plain, inline)]

Mark H Weaver <mhw <at> netris.org> writes:

> Okay, good point.  Indeed, the expansion time of our 'and' and 'or'
> macros is quadratic in the number of operands.  They are implemented in
> boot-9.scm as follows:
>
>   (define-syntax and
>     (syntax-rules ()
>       ((_) #t)
>       ((_ x) x)
>       ((_ x y ...) (if x (and y ...) #f))))
>   
>   (define-syntax or
>     (syntax-rules ()
>       ((_) #f)
>       ((_ x) x)
>       ((_ x y ...) (let ((t x)) (if t t (or y ...))))))
>
> The problem is that the "y ..." pattern has to iterate down the entire
> list to verify that it's a proper list, and this is done for each
> operand.

Why would it have to do that?  It cannot be anything valid else if it is
a pair.

Note that the compiler does not bother to do this for other cases: if I
write

[zorpo.scm (text/plain, inline)]

(use-modules (srfi srfi-19) (srfi srfi-1))
(define start-time (current-time))
(primitive-eval (cons '+ (circular-list 0)))
(display (time-difference (current-time) start-time))

[Message part 3 (text/plain, inline)]

then I get
dak <at> lola:/usr/local/tmp/guile$ meta/guile /tmp/zorpo.scm 
;;; note: source file /tmp/zorpo.scm
;;;       newer than compiled /usr/local/tmp/guile/cache/guile/ccache/2.2-LE-4-3.4/tmp/zorpo.scm.go
;;; note: auto-compilation is enabled, set GUILE_AUTO_COMPILE=0
;;;       or pass the --no-auto-compile argument to disable.
;;; compiling /tmp/zorpo.scm
;;; compiled /usr/local/tmp/guile/cache/guile/ccache/2.2-LE-4-3.4/tmp/zorpo.scm.go
Warning: Unwind-only `stack-overflow' exception; skipping pre-unwind handler.
Warning: Unwind-only `stack-overflow' exception; skipping pre-unwind handler.

and what of it?  If you really, really must, do one recursive top-level
check of everything before starting.  But re-verifying structural
integraty after applying every single rule is madness.  Actually,
replacing '+ by 'and in that example will lead to the same bomb-out.  So
it does not look like structural integrity verification is happening
anyway.

> The simplest solution would be to replace all occurrences of "y ..."
> with ". y" in the two macros above, but that has the slight downside
> of making the error message much less comprehensible if you use a
> dotted tail in an 'and' or 'or' form.  Maybe that's not worth worrying
> about though.

If you care about nice error messages, do a single upfront check.

> Alternatively, we could use procedural macros here, but we'd have to
> limit ourselves to very primitive forms in the code because this is so
> early in the bootstrap.

I don't think it's worth it just for and/or (the form generated by or
actually seems more prone to buildup and churn).  But for syntax
replacements in general?  Sure.  You don't want quadratic behavior in
bare-bones replacements like these.

-- 
David Kastrup

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Tue, 01 Apr 2014 07:12:02 GMT) Full text and rfc822 format available.

Message #22 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 17147 <at> debbugs.gnu.org
Subject: Re: bug#17147: Performance regression by 3000000% for evaluating
 "and" form
Date: Tue, 01 Apr 2014 03:10:22 -0400

David Kastrup <dak <at> gnu.org> writes:

> Mark H Weaver <mhw <at> netris.org> writes:
>
>> Okay, good point.  Indeed, the expansion time of our 'and' and 'or'
>> macros is quadratic in the number of operands.  They are implemented in
>> boot-9.scm as follows:
>>
>>   (define-syntax and
>>     (syntax-rules ()
>>       ((_) #t)
>>       ((_ x) x)
>>       ((_ x y ...) (if x (and y ...) #f))))
>>   
>>   (define-syntax or
>>     (syntax-rules ()
>>       ((_) #f)
>>       ((_ x) x)
>>       ((_ x y ...) (let ((t x)) (if t t (or y ...))))))
>>
>> The problem is that the "y ..." pattern has to iterate down the entire
>> list to verify that it's a proper list, and this is done for each
>> operand.
>
> Why would it have to do that?  It cannot be anything valid else if it is
> a pair.
>
> Note that the compiler does not bother to do this for other cases: if I
> write
>
> (use-modules (srfi srfi-19) (srfi srfi-1))
> (define start-time (current-time))
> (primitive-eval (cons '+ (circular-list 0)))
> (display (time-difference (current-time) start-time))

The issue is not circular lists.  The Scheme standards don't specify
what happens when expressions are cyclic.  The issue is expressions that
end with a dotted tail, such as (and x y . z).  The ability to write
macros that check for such patterns and handle them specially is
standardized and widely supported and used.  "y ..." is specified by
R5RS, R6RS, and R7RS to match only if the final CDR is null, and plenty
of existing code depends on this behavior.  We can't change this.

>> The simplest solution would be to replace all occurrences of "y ..."
>> with ". y" in the two macros above, but that has the slight downside
>> of making the error message much less comprehensible if you use a
>> dotted tail in an 'and' or 'or' form.  Maybe that's not worth worrying
>> about though.
>
> If you care about nice error messages, do a single upfront check.

Well, yes, but there are awkward problems having to do with the fact
that this is early in the bootstrap, before the module system has been
booted, and we don't currently have a convenient way to have internal
helpers at this stage that aren't exported by (guile), which means
polluting the default namespace.  The end result is that I'm inclined to
either not worry about good error reporting for things like (and x . y)
or to rewrite 'and' and 'or' as procedural macros.

>> Alternatively, we could use procedural macros here, but we'd have to
>> limit ourselves to very primitive forms in the code because this is so
>> early in the bootstrap.
>
> I don't think it's worth it just for and/or (the form generated by or
> actually seems more prone to buildup and churn).  But for syntax
> replacements in general?  Sure.  You don't want quadratic behavior in
> bare-bones replacements like these.

Sorry, but there's no easy solution here.  The "y ..." pattern _must_
fail to match unless the last CDR is null.  I could imagine clever
optimization tricks where all cons cells of the generated (and y ...)
include an annotation asserting that the final CDR is null, but IMO this
would not be worth the added code complexity and the extra storage
needed by the annotations.  I think the best we can reasonably do is to
be aware of the efficiency characteristics of the patterns when writing
macros.

    Regards,
      Mark

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Tue, 01 Apr 2014 08:23:02 GMT) Full text and rfc822 format available.

Message #25 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 17147 <at> debbugs.gnu.org
Subject: Re: bug#17147: Performance regression by 3000000% for evaluating
 "and" form
Date: Tue, 01 Apr 2014 10:22:17 +0200

Mark H Weaver <mhw <at> netris.org> writes:

> David Kastrup <dak <at> gnu.org> writes:
>
>> Mark H Weaver <mhw <at> netris.org> writes:
>>
>>> Okay, good point.  Indeed, the expansion time of our 'and' and 'or'
>>> macros is quadratic in the number of operands.  They are implemented in
>>> boot-9.scm as follows:
>>>
>>>   (define-syntax and
>>>     (syntax-rules ()
>>>       ((_) #t)
>>>       ((_ x) x)
>>>       ((_ x y ...) (if x (and y ...) #f))))
>>>   
>>>   (define-syntax or
>>>     (syntax-rules ()
>>>       ((_) #f)
>>>       ((_ x) x)
>>>       ((_ x y ...) (let ((t x)) (if t t (or y ...))))))
>>>
>>> The problem is that the "y ..." pattern has to iterate down the entire
>>> list to verify that it's a proper list, and this is done for each
>>> operand.
>>
>> Why would it have to do that?  It cannot be anything valid else if it is
>> a pair.

[...]

> The issue is not circular lists.  The Scheme standards don't specify
> what happens when expressions are cyclic.  The issue is expressions that
> end with a dotted tail, such as (and x y . z).  The ability to write
> macros that check for such patterns and handle them specially is
> standardized and widely supported and used.

> "y ..." is specified by R5RS, R6RS, and R7RS to match only if the
> final CDR is null, and plenty of existing code depends on this
> behavior.  We can't change this.
>
>>> The simplest solution would be to replace all occurrences of "y ..."
>>> with ". y" in the two macros above, but that has the slight downside
>>> of making the error message much less comprehensible if you use a
>>> dotted tail in an 'and' or 'or' form.  Maybe that's not worth worrying
>>> about though.
>>
>> If you care about nice error messages, do a single upfront check.

Actually, if you care about nicer error messages than a complaint about
(and . whatever), you can write the last rule a bit more cumbersome:

(define-syntax and
  (syntax-rules ()
    ((_) #t)
    ((_ x) x)
    ((_ x y . z) (if x (and y . z) #f))))

Possibly followed by a special error-out rule

    ((_ x . y) raise-an-error)

Or preceded by an error-out rule ((_ . x) raise-an-error).

> Well, yes, but there are awkward problems having to do with the fact
> that this is early in the bootstrap, before the module system has been
> booted, and we don't currently have a convenient way to have internal
> helpers at this stage that aren't exported by (guile), which means
> polluting the default namespace.  The end result is that I'm inclined
> to either not worry about good error reporting for things like
> (and x . y) or to rewrite 'and' and 'or' as procedural macros.

>> I don't think it's worth it just for and/or (the form generated by or
>> actually seems more prone to buildup and churn).  But for syntax
>> replacements in general?  Sure.  You don't want quadratic behavior in
>> bare-bones replacements like these.
>
> Sorry, but there's no easy solution here.  The "y ..." pattern _must_
> fail to match unless the last CDR is null.  I could imagine clever
> optimization tricks where all cons cells of the generated (and y ...)
> include an annotation asserting that the final CDR is null,

As one can expect a lot of user-written code to contain patterns using
an ellipsis, I think that would be an ultimately effective optimization.

> but IMO this would not be worth the added code complexity and the
> extra storage needed by the annotations.  I think the best we can
> reasonably do is to be aware of the efficiency characteristics of the
> patterns when writing macros.

If Guile is not going to optimize ..., I think it would be good if it
led by example (its sources _are_ going to teach people how to program)
and simply avoided using ... altogether, at the very least where
recursive expansion rules are concerned.

I have the suspicion, however, that there are a lot of tutorials/etc in
the wild that do make use of ellipses.  So while avoiding ... will help
with keeping Guile's own internals scalable to large expressions, it
will still likely affect the performance of a significant amount of
third-party code.

-- 
David Kastrup

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Tue, 01 Apr 2014 12:00:03 GMT) Full text and rfc822 format available.

Message #28 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 17147 <at> debbugs.gnu.org
Subject: Re: bug#17147: Performance regression by 3000000% for evaluating
 "and" form
Date: Tue, 01 Apr 2014 13:59:02 +0200

David Kastrup <dak <at> gnu.org> writes:

> If Guile is not going to optimize ..., I think it would be good if it
> led by example (its sources _are_ going to teach people how to program)
> and simply avoided using ... altogether, at the very least where
> recursive expansion rules are concerned.
>
> I have the suspicion, however, that there are a lot of tutorials/etc in
> the wild that do make use of ellipses.  So while avoiding ... will help
> with keeping Guile's own internals scalable to large expressions, it
> will still likely affect the performance of a significant amount of
> third-party code.

By the way: module/system/vm/assembler.scm is taking an absolutely
insane amount of time to compile.  It also makes copious use of syntax
rules including ... and this might be related.

-- 
David Kastrup

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Tue, 01 Apr 2014 16:21:02 GMT) Full text and rfc822 format available.

Message #31 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 17147 <at> debbugs.gnu.org
Subject: Re: bug#17147: Performance regression by 3000000% for evaluating
 "and" form
Date: Tue, 01 Apr 2014 12:19:56 -0400

David Kastrup <dak <at> gnu.org> writes:

> Actually, if you care about nicer error messages than a complaint about
> (and . whatever), you can write the last rule a bit more cumbersome:
>
> (define-syntax and
>   (syntax-rules ()
>     ((_) #t)
>     ((_ x) x)
>     ((_ x y . z) (if x (and y . z) #f))))

This would improve the error message for (and x . y), but not for
(and x y . z), (and x y z . w), etc.

> Possibly followed by a special error-out rule
>
>     ((_ x . y) raise-an-error)

No need for this.  If none of the patterns match, the user will get a
reasonable error message.

> Or preceded by an error-out rule ((_ . x) raise-an-error).

This pattern will match things like (and), (and x), (and x y), etc, so
it would have to come last.

In order to produce good error messages, 'syntax-rules' macros must
detect the errors in the original expression.  If the error is not
detected until some intermediate expression is expanded, users will see
errors about expressions they did not type, which is quite confusing.

For procedural macros, there is more flexibility, because
'syntax-violation' can be called when an error is detected, and the
original form (and a relevant subform, if desired) can be passed to it
explicitly.

BTW, whenever I talk about "procedural macros", I'm talking about the
so-called 'syntax-case' macros, which are hygienic by default but
provide controlled ways to create unhygienic behavior where desired.

The old 'define-macro' macros are unhygienic by default, and thus prone
to problems of unintended variable capture.  IMO, 'define-macro' should
never be used in new code unless it cannot be avoided (and I guess
LilyPond cannot avoid it until support for 1.8 is dropped, due to the
'void' issue).

>>> I don't think it's worth it just for and/or (the form generated by or
>>> actually seems more prone to buildup and churn).  But for syntax
>>> replacements in general?  Sure.  You don't want quadratic behavior in
>>> bare-bones replacements like these.
>>
>> Sorry, but there's no easy solution here.  The "y ..." pattern _must_
>> fail to match unless the last CDR is null.  I could imagine clever
>> optimization tricks where all cons cells of the generated (and y ...)
>> include an annotation asserting that the final CDR is null,
>
> As one can expect a lot of user-written code to contain patterns using
> an ellipsis, I think that would be an ultimately effective optimization.

I suspect it would significantly increase compile time and memory usage
in the common case.

>> but IMO this would not be worth the added code complexity and the
>> extra storage needed by the annotations.  I think the best we can
>> reasonably do is to be aware of the efficiency characteristics of the
>> patterns when writing macros.
>
> If Guile is not going to optimize ..., I think it would be good if it
> led by example (its sources _are_ going to teach people how to program)
> and simply avoided using ... altogether,

I disagree that we should avoid the use of ... altogether.  IMO, writing
macros in such a way to produce good error messages is very important,
and that means making patterns as strict as they can be, to detect
errors as soon as possible.  For this reason, I prefer "y ..." to ". y".

> at the very least where recursive expansion rules are concerned.

If the relevant list being matched might conceivably be large, then it's
probably best to have a main macro that is as strict as possible,
e.g. using ellipses, and then to do the actual recursion using internal
helpers that use ". y" and do no error checking.

However, this approach carries a different cost: added code complexity
and reduced readability.  IMO, it does not make sense to design every
macro to efficiently handle huge numbers of operands.  There is such a
thing as "premature optimization", and as I recall Knuth had some strong
words to say about that.

For most macros, and for most code in general, I advocate keeping things
as simple as possible, and only adding complexity where it is needed.
However, I agree that we should make Guile's core forms such as 'and'
and 'or' scalable.

     Regards,
       Mark

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Mon, 12 May 2014 07:28:01 GMT) Full text and rfc822 format available.

Message #34 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: 17147 <at> debbugs.gnu.org
Subject: Another idea
Date: Mon, 12 May 2014 08:08:02 +0200

Maybe the problem is just a choice of bad primitives for going to
tree-il.  Constructs like and/or/cond translate to deeply nested if
statements.  If the primitive retains the nesting level, then syntax
translation can be done in one go instead of recursive matching.
Example: use a primitive %cond that does the same as cond but without
any of its syntax niceties.  If really necessary, we can implement %cond
as an unchecked procedural macro mapping to (if ...) again in a first
iteration and it should still help us get out the complexity trap.

Then
(define-syntax and
  (syntax-rules ()
    ((_) #t)
    ((_ x) x)
    ((_ x y ...) (if x (and y ...) #f))))

(define-syntax or
  (syntax-rules ()
    ((_) #f)
    ((_ x) x)
    ((_ x y ...) (let ((t x)) (if t t (or y ...))))))

translates into

(define-syntax and
  (syntax-rules ()
    ((_) #t)
    ((_ x ... y) (%cond ((not x) #f) ... (#t y)))))

(define-syntax or
  (syntax-rules ()
    ((_) #f)
    ((_ x ... y) (%cond (x) ... (#t y)))))

and we have no inherently quadratic behavior here since the rules are
not applied recursively and the ... pattern is just matched once per
construct.

-- 
David Kastrup

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Tue, 13 May 2014 13:04:02 GMT) Full text and rfc822 format available.

Message #37 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: 17147 <at> debbugs.gnu.org
Subject: Scalability front and back...
Date: Tue, 13 May 2014 15:03:17 +0200

[Message part 1 (text/plain, inline)]

The following code continues exploding in Guile 2.0.9 even after fixing
the VM exploding under the stock "reduce-right" and after chickening out
on syntax-case et al for "and" and using a procedural macro instead.

[zorpo.scm (text/plain, inline)]

(use-modules (srfi srfi-1))

(define (reduce-right f ridentity lst)
  "`reduce-right' is a variant of `fold-right', where the first call to
F is on two elements from LST, rather than one element and a given
initial value.  If LST is empty, RIDENTITY is returned.  If LST
has just one element then that's the return value."
  (reduce f ridentity (reverse lst)))

(define-macro (and . rest)
  (reduce-right
   (lambda (a b) `(if ,a ,b #f))
   #t
   rest))

(primitive-eval (cons 'and (make-list 10000 #f)))

[Message part 3 (text/plain, inline)]

Now I get
Backtrace:
In ice-9/psyntax.scm:
1259: 19 [#<procedure 8f2da28 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 18 [#<procedure 8f2da00 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 17 [#<procedure 8f2d9d8 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 16 [#<procedure 8f2d9b0 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 15 [#<procedure 8f2d988 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 14 [#<procedure 8f2d960 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 13 [#<procedure 8f2d938 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 12 [#<procedure 8f2d910 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 11 [#<procedure 8f2d8e8 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 10 [#<procedure 8f2d8c0 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 9 [#<procedure 8f2d898 (test then else)> #f (if #f (if #f # #f) #f) #f]
1259: 8 [#<procedure 8f2d870 (test then else)> #f (if #f (if #f # #f) #f) #f]
1257: 7 [#<procedure 8f2d848 (test then else)> #f (if #f (if #f # #f) #f) #f]
1186: 6 [syntax-type (if #f (if #f (if #f # ...) ...) ...) () ...]
 579: 5 [syntax-type if () ((top)) #f #f (hygiene guile-user) #t]
 293: 4 [get-global-definition-hook if (hygiene guile-user)]
In ice-9/boot-9.scm:
2697: 3 [#<procedure 8c38450 at ice-9/boot-9.scm:2696:4 (name #:optional autoload version #:key ensure)> # ...]
2512: 2 [nested-ref-module #<module () 8c25ca8> (guile-user)]
2294: 1 [module-ref-submodule #<module () 8c25ca8> guile-user]
1925: 0 [#<procedure 8c31240 at ice-9/boot-9.scm:1925:2 (module)> #]

ice-9/boot-9.scm:1925:2: In procedure #<procedure 8c31240 at ice-9/boot-9.scm:1925:2 (module)>:
ice-9/boot-9.scm:1925:2: Throw to key `vm-error' with args `(vm-run "VM: Stack overflow" ())'.


Using guile --no-auto-compile on the same file, I just get
Backtrace:
In ice-9/psyntax.scm:
1259: 19 Aborted (core dumped)


That's the stable version of GUILE included with Ubuntu 14.04.

-- 
David Kastrup

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Wed, 04 Jun 2014 14:19:01 GMT) Full text and rfc822 format available.

Message #40 received at 17147 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: 17147 <at> debbugs.gnu.org
Cc: David Kastrup <dak <at> gnu.org>
Subject: [PATCH] Add versions of and/or avoiding O(n^2) argument matching
Date: Wed,  4 Jun 2014 16:18:29 +0200

Fixes <http://bugs.gnu.org/17147>

* module/ice-9/boot-9.scm (and, or): Add syntax rules that only do one
  pattern matching operation per and/or rather than per argument.

Signed-off-by: David Kastrup <dak <at> gnu.org>
---
 module/ice-9/boot-9.scm | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/module/ice-9/boot-9.scm b/module/ice-9/boot-9.scm
index 7f38c4b..d2f34d9 100644
--- a/module/ice-9/boot-9.scm
+++ b/module/ice-9/boot-9.scm
@@ -426,6 +426,8 @@ file with the given name already exists, the effect is unspecified."
 ;; The binding for `macroexpand' has now been overridden, making psyntax the
 ;; expander now.
 
+;; quasisyntax requires simple definitions of and/or for bootstrapping.
+
 (define-syntax and
   (syntax-rules ()
     ((_) #t)
@@ -440,6 +442,30 @@ file with the given name already exists, the effect is unspecified."
 
 (include-from-path "ice-9/quasisyntax")
 
+;; Don't use syntactically recursive definitions of and/or for speed
+;; reasons as they need to match the whole argument list for each
+;; iteration.
+
+(define-syntax and
+  (lambda (expr)
+    (syntax-case expr ()
+      ((_) #'#t)
+      ((_ x y ...)
+       (let loop ((x #'x) (y #'(y ...)))
+         (if (null? y)
+             x
+             #`(if #,x #,(loop (car y) (cdr y)) #f)))))))
+
+(define-syntax or
+  (lambda (expr)
+    (syntax-case expr ()
+      ((_) #'#f)
+      ((_ x y ...)
+       (let loop ((x #'x) (y #'(y ...)))
+	 (if (null? y)
+	     x
+	     #`(let ((t #,x)) (if t t #,(loop (car y) (cdr y))))))))))
+
 (define-syntax-rule (when test stmt stmt* ...)
   (if test (begin stmt stmt* ...)))
 
-- 
1.9.1

Reply sent to Mark H Weaver <mhw <at> netris.org>:
You have taken responsibility. (Thu, 05 Jun 2014 01:10:03 GMT) Full text and rfc822 format available.

Notification sent to David Kastrup <dak <at> gnu.org>:
bug acknowledged by developer. (Thu, 05 Jun 2014 01:10:05 GMT) Full text and rfc822 format available.

Message #45 received at 17147-done <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 17147-done <at> debbugs.gnu.org
Subject: Re: bug#17147: [PATCH] Add versions of and/or avoiding O(n^2)
 argument matching
Date: Wed, 04 Jun 2014 21:09:23 -0400

David Kastrup <dak <at> gnu.org> writes:

> Fixes <http://bugs.gnu.org/17147>
>
> * module/ice-9/boot-9.scm (and, or): Add syntax rules that only do one
>   pattern matching operation per and/or rather than per argument.

Thanks, but I ended up simply changing the macros to dotted tail
patterns.  It's commit 1ea8954814d124b995f2296bc6aec92adb566bc1.

I'm closing this bug now.

      Mark

Information forwarded to bug-guile <at> gnu.org:
bug#17147; Package guile. (Thu, 05 Jun 2014 04:08:01 GMT) Full text and rfc822 format available.

Message #48 received at 17147-done <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 17147-done <at> debbugs.gnu.org
Subject: Re: bug#17147: [PATCH] Add versions of and/or avoiding O(n^2)
 argument matching
Date: Thu, 05 Jun 2014 06:06:53 +0200

Mark H Weaver <mhw <at> netris.org> writes:

> David Kastrup <dak <at> gnu.org> writes:
>
>> Fixes <http://bugs.gnu.org/17147>
>>
>> * module/ice-9/boot-9.scm (and, or): Add syntax rules that only do one
>>   pattern matching operation per and/or rather than per argument.
>
> Thanks, but I ended up simply changing the macros to dotted tail
> patterns.  It's commit 1ea8954814d124b995f2296bc6aec92adb566bc1.

Still one matching operation per argument (though an O(1) rather than an
O(n) one) rather than a single match.

Incidentally, most other syntax rules don't degrade on ... like that
because they do not do recursive matching on already matched patterns
either, case in point being the cond rule.  So indeed and/or have been
about the worst culprits for this case.

-- 
David Kastrup

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 03 Jul 2014 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 298 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #17147 Performance regression by 3000000% for evaluating "and" form

GNU bug report logs - #17147
Performance regression by 3000000% for evaluating "and" form