GNU bug report logs - #36597
27.0.50; rehash hash tables eagerly in pdumper

Previous Next

Package: emacs;

Reported by: Pip Cet <pipcet <at> gmail.com>

Date: Thu, 11 Jul 2019 14:07:02 UTC

Severity: normal

Tags: patch

Found in version 27.0.50

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36597 in the body.
You can then email your comments to 36597 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Thu, 11 Jul 2019 14:07:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to Pip Cet <pipcet <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 11 Jul 2019 14:07:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org, Daniel Colascione <dancol <at> dancol.org>
Subject: 27.0.50; rehash hash tables eagerly in pdumper
Date: Thu, 11 Jul 2019 14:05:14 +0000
[Message part 1 (text/plain, inline)]
This is a follow-up to bug#36447, which has been fixed.

Lazy rehashing for hash tables should be removed. This patch does that.

Lazy rehashing makes all code that accesses hash tables a little more
complicated; in at least one case, we forgot to do that, resulting in
bug#36596.

The sole benefit of lazy rehashing to be mentioned so far is that
start-up time is reduced a little (by less than a millisecond on this
system), and that for certain usage patterns (many short Emacs
sessions that don't do very much, I think), this might outweigh the
negative consequences of lazy rehashing.

Lazy rehashing means less maintainable code, more code, and, at
run-time, slightly slower code.

In particular, it means that we have to have a comment in lisp.h which
distracts from the code and has to be marked as "IMPORTANT!!!!!!!",
which I think is nearly as annoying as having to jump through special
hoops before accessing a hash table.

I'm posting this as a separate bug so we can have a, hopefully, brief
discussion about it and then either move away from lazy rehashing or
continue using it; if we decide to continue using it, it should be
documented in much more detail.

(We should keep in mind that conditional branches, even ones that are
well-predicted and don't cause cache misses, aren't free: they use
execution units, and preparing their arguments in registers increases
register pressure, and of course they increase code size.)
[0001-Rehash-hash-tables-eagerly-after-loading-a-portable-.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 14 Jul 2019 14:40:02 GMT) Full text and rfc822 format available.

Message #8 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36597 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>
Subject: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sun, 14 Jul 2019 07:39:08 -0700
[Message part 1 (text/plain, inline)]
Although I like the simplicity of eager rehashing, I'm not yet sold on the 
performance implications. On my usual (cd lisp && make compile-always) 
benchmark, the patch hurt user+system CPU time performance by 1.5%. Admittedly 
just one benchmark, but still....

Also, must we expose Vpdumper_hash_tables to Lisp? Surely it'd be better to keep 
it private to pdumper.c.

I'll CC: to Daniel to see whether he has any insights about improvements in this 
area.

PS. I ran that benchmark on my home desktop, an Intel Xeon E3-1225 v2 running 
Ubunto 18.04.2. To run it, I rebased your patch and also removed the 
no-longer-used PDUMPER_CHECK_REHASHING macro that my GCC complained about 
(wonder why that didn't happen for you?), resulting in the attached patch 
against current master 8ff09154a29a1151afb2902267ca35f89ebda73c.
[0001-Rehash-hash-tables-eagerly-after-loading-a-portable-.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 14 Jul 2019 15:03:02 GMT) Full text and rfc822 format available.

Message #11 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sun, 14 Jul 2019 15:01:47 +0000
On Sun, Jul 14, 2019 at 2:40 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Although I like the simplicity of eager rehashing, I'm not yet sold on the
> performance implications. On my usual (cd lisp && make compile-always)
> benchmark, the patch hurt user+system CPU time performance by 1.5%. Admittedly
> just one benchmark, but still....

Indeed, that's plenty of small Emacs processes not doing very much.
It's not the case we ought to be optimizing for, I think, but the
performance concerns should be taken seriously. One way to avoid the
performance problems entirely is preferred-address loading of hash
dumps, but that has security implications...

> Also, must we expose Vpdumper_hash_tables to Lisp? Surely it'd be better to keep
> it private to pdumper.c.

Oops, I agree absolutely. Will remove that.

> I'll CC: to Daniel to see whether he has any insights about improvements in this
> area.

Sure; I sent the original email to Daniel, too, of course.

> PS. I ran that benchmark on my home desktop, an Intel Xeon E3-1225 v2 running
> Ubunto 18.04.2. To run it, I rebased your patch and also removed the
> no-longer-used PDUMPER_CHECK_REHASHING macro that my GCC complained about
> (wonder why that didn't happen for you?), resulting in the attached patch
> against current master 8ff09154a29a1151afb2902267ca35f89ebda73c.

Some GCC versions complain about it, some don't, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 14 Jul 2019 15:50:01 GMT) Full text and rfc822 format available.

Message #14 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36597 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sun, 14 Jul 2019 08:49:16 -0700
Pip Cet wrote:
> Indeed, that's plenty of small Emacs processes not doing very much.
> It's not the case we ought to be optimizing for, I think, but the
> performance concerns should be taken seriously.

What's a good benchmark for what we should be optimizing for? Ideally something 
somewhat-realistic as opposed to a microbenchmark.

It doesn't appear to be as simple as plenty of processes not doing very much. 
This benchmark:

cd leim && time make -B ../lisp/leim/ja-dic/ja-dic.el

is dominated by a single CPU-intensive Emacs process and takes about 19 CPU 
seconds on my home desktop. The proposed patch slows this benchmark down by 
about 0.6%. (I ran the benchmark ten times after a warmup run, and took the 
average of the ten user+system times.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 14 Jul 2019 16:56:01 GMT) Full text and rfc822 format available.

Message #17 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 36597 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sun, 14 Jul 2019 16:54:52 +0000
On Sun, Jul 14, 2019 at 3:49 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Pip Cet wrote:
> > Indeed, that's plenty of small Emacs processes not doing very much.
> > It's not the case we ought to be optimizing for, I think, but the
> > performance concerns should be taken seriously.
>
> What's a good benchmark for what we should be optimizing for? Ideally something
> somewhat-realistic as opposed to a microbenchmark.

I'd suggest something along the lines of:
perf stat -r 10 --table -e
cycles:u,instructions:u,branches:u,branch-misses:u make -B
../lisp/leim/ja-dic/ja-dic.el

With my patch:

    61,136,837,192      cycles:u
               ( +-  0.42% )
    42,313,912,525      instructions:u            #    0.69  insn per
cycle           ( +-  0.00% )
    12,131,893,779      branches:u
               ( +-  0.00% )
        47,602,747      branch-misses:u           #    0.39% of all
branches          ( +-  1.11% )

without my patch:

    61,460,927,899      cycles:u
               ( +-  0.44% )
    42,358,289,131      instructions:u            #    0.69  insn per
cycle           ( +-  0.00% )
    12,134,582,441      branches:u
               ( +-  0.00% )
        48,540,232      branch-misses:u           #    0.40% of all
branches          ( +-  1.09% )

A 0.5% improvement.

By comparison,

perf stat -r 100 --table -e
cycles:u,instructions:u,branches:u,branch-misses:u
~/git/emacs/src/emacs -Q --batch

With my patch:

        80,749,425      cycles:u
               ( +-  0.81% )
       146,770,045      instructions:u            #    1.82  insn per
cycle           ( +-  0.00% )
        29,218,226      branches:u
               ( +-  0.00% )
           450,275      branch-misses:u           #    1.54% of all
branches          ( +-  0.11% )

without my patch:

        78,896,395      cycles:u
               ( +-  0.12% )
       147,059,777      instructions:u            #    1.86  insn per
cycle           ( +-  0.00% )
        29,287,917      branches:u
               ( +-  0.00% )
           450,194      branch-misses:u           #    1.54% of all
branches          ( +-  0.09% )

About a 2% slowdown.
perf stat -r cycles:u,instructions:u,branches:u,missed-branches:u

> is dominated by a single CPU-intensive Emacs process and takes about 19 CPU
> seconds on my home desktop. The proposed patch slows this benchmark down by
> about 0.6%. (I ran the benchmark ten times after a warmup run, and took the
> average of the ten user+system times.)

Hmm. I'd like to know the reason for that, but I suspect it may simply
be thermal throttling. That's the reason I'm running tests in
parallel, though it might be better to compare instruction counts or
scheduled µ-ops rather than cycles...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Mon, 15 Jul 2019 14:41:01 GMT) Full text and rfc822 format available.

Message #20 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 36597 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Mon, 15 Jul 2019 14:39:49 +0000
On Sun, Jul 14, 2019 at 4:54 PM Pip Cet <pipcet <at> gmail.com> wrote:
> > What's a good benchmark for what we should be optimizing for? Ideally something
> > somewhat-realistic as opposed to a microbenchmark.

Here are the things I've tried so far (building a full histogram of
actual clock cycles per run in all cases):

1. ja-dic.el: my patch is slightly faster: (on the order of 0.5%)
2. emacs -Q --batch: my patch is slightly slower (on the order of ~2%)
3. emacs -Q --eval "(run-with-timer 1 nil #'kill-emacs)": my patch is
very slightly faster (on the order of 0.1%)

Test 3 was run using a dedicated Xvnc server; all tests were run in
parallel with and without the patch.

The main advantage of my patch appears to be a reduction in pdumper
image size, which somehow leads to the performance improvement. I
haven't benchmarked a hypothetical patch which reduces the pdumper
image size but continues rehashing lazily.

But I noticed that my patch may affect hashes more than it should,
because it makes the thawed hash have the same size as the number of
hash entries in it. That seems not to hurt performance...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Thu, 18 Jul 2019 05:40:02 GMT) Full text and rfc822 format available.

Message #23 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 36597 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>,
 pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Thu, 18 Jul 2019 08:39:20 +0300
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Sun, 14 Jul 2019 07:39:08 -0700
> Cc: 36597 <at> debbugs.gnu.org
> 
> Although I like the simplicity of eager rehashing, I'm not yet sold on the 
> performance implications. On my usual (cd lisp && make compile-always) 
> benchmark, the patch hurt user+system CPU time performance by 1.5%. Admittedly 
> just one benchmark, but still....
> 
> Also, must we expose Vpdumper_hash_tables to Lisp? Surely it'd be better to keep 
> it private to pdumper.c.
> 
> I'll CC: to Daniel to see whether he has any insights about improvements in this 
> area.

You didn't CC Daniel, AFAICT, so I did that now.

> PS. I ran that benchmark on my home desktop, an Intel Xeon E3-1225 v2 running 
> Ubunto 18.04.2. To run it, I rebased your patch and also removed the 
> no-longer-used PDUMPER_CHECK_REHASHING macro that my GCC complained about 
> (wonder why that didn't happen for you?), resulting in the attached patch 
> against current master 8ff09154a29a1151afb2902267ca35f89ebda73c.
> 
> >From 96a36697b2f472f3a6b3138444659babd0d73f32 Mon Sep 17 00:00:00 2001
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sun, 14 Jul 2019 07:22:01 -0700
> Subject: [PATCH] Rehash hash tables eagerly after loading a portable dump
> 
> * src/lisp.h (hash_rehash_needed_p): Remove.  All uses removed.
> (hash_rehash_if_needed): Remove.  All uses removed.
> (struct Lisp_Hash_Table): Remove comment about rehashing hash tables.
> * src/pdumper.c (thaw_hash_tables): New function.
> (hash_table_thaw): New function.
> (hash_table_freeze): New function.
> (dump_hash_table): Simplify.
> (dump_hash_table_list): New function.
> (hash_table_contents): New function.
> (Fdump_emacs_portable): Handle hash tables by eager rehashing.
> (pdumper_load): Restore hash tables.
> (init_pdumper_once): New function.
> (PDUMPER_CHECK_REHASHING): Remove.
> ---
>  src/bytecode.c  |   1 -
>  src/composite.c |   1 -
>  src/emacs.c     |   1 +
>  src/fns.c       |  54 ++++----------
>  src/lisp.h      |  19 +----
>  src/minibuf.c   |   3 -
>  src/pdumper.c   | 188 ++++++++++++++++++++++++------------------------
>  src/pdumper.h   |   1 +
>  8 files changed, 108 insertions(+), 160 deletions(-)
> 
> diff --git a/src/bytecode.c b/src/bytecode.c
> index 29dff44f00..9c72429e0c 100644
> --- a/src/bytecode.c
> +++ b/src/bytecode.c
> @@ -1402,7 +1402,6 @@ #define DEFINE(name, value) LABEL (name) ,
>              Lisp_Object v1 = POP;
>              ptrdiff_t i;
>              struct Lisp_Hash_Table *h = XHASH_TABLE (jmp_table);
> -            hash_rehash_if_needed (h);
>  
>              /* h->count is a faster approximation for HASH_TABLE_SIZE (h)
>                 here. */
> diff --git a/src/composite.c b/src/composite.c
> index 183062de46..49a285cff0 100644
> --- a/src/composite.c
> +++ b/src/composite.c
> @@ -654,7 +654,6 @@ gstring_lookup_cache (Lisp_Object header)
>  composition_gstring_put_cache (Lisp_Object gstring, ptrdiff_t len)
>  {
>    struct Lisp_Hash_Table *h = XHASH_TABLE (gstring_hash_table);
> -  hash_rehash_if_needed (h);
>    Lisp_Object header = LGSTRING_HEADER (gstring);
>    EMACS_UINT hash = h->test.hashfn (&h->test, header);
>    if (len < 0)
> diff --git a/src/emacs.c b/src/emacs.c
> index ad661a081b..855b2c6715 100644
> --- a/src/emacs.c
> +++ b/src/emacs.c
> @@ -1560,6 +1560,7 @@ main (int argc, char **argv)
>    if (!initialized)
>      {
>        init_alloc_once ();
> +      init_pdumper_once ();
>        init_obarray_once ();
>        init_eval_once ();
>        init_charset_once ();
> diff --git a/src/fns.c b/src/fns.c
> index 0497588689..b6134a314c 100644
> --- a/src/fns.c
> +++ b/src/fns.c
> @@ -4241,43 +4241,24 @@ hash_table_rehash (struct Lisp_Hash_Table *h)
>  {
>    ptrdiff_t size = HASH_TABLE_SIZE (h);
>  
> -  /* These structures may have been purecopied and shared
> -     (bug#36447).  */
> -  h->next = Fcopy_sequence (h->next);
> -  h->index = Fcopy_sequence (h->index);
> -  h->hash = Fcopy_sequence (h->hash);
> -
>    /* Recompute the actual hash codes for each entry in the table.
>       Order is still invalid.  */
>    for (ptrdiff_t i = 0; i < size; ++i)
> -    if (!NILP (HASH_HASH (h, i)))
> -      {
> -        Lisp_Object key = HASH_KEY (h, i);
> -        EMACS_UINT hash_code = h->test.hashfn (&h->test, key);
> -        set_hash_hash_slot (h, i, make_fixnum (hash_code));
> -      }
> -
> -  /* Reset the index so that any slot we don't fill below is marked
> -     invalid.  */
> -  Ffillarray (h->index, make_fixnum (-1));
> +    {
> +      Lisp_Object key = HASH_KEY (h, i);
> +      EMACS_UINT hash_code = h->test.hashfn (&h->test, key);
> +      set_hash_hash_slot (h, i, make_fixnum (hash_code));
> +    }
>  
>    /* Rebuild the collision chains.  */
>    for (ptrdiff_t i = 0; i < size; ++i)
> -    if (!NILP (HASH_HASH (h, i)))
> -      {
> -        EMACS_UINT hash_code = XUFIXNUM (HASH_HASH (h, i));
> -        ptrdiff_t start_of_bucket = hash_code % ASIZE (h->index);
> -        set_hash_next_slot (h, i, HASH_INDEX (h, start_of_bucket));
> -        set_hash_index_slot (h, start_of_bucket, i);
> -        eassert (HASH_NEXT (h, i) != i); /* Stop loops.  */
> -      }
> -
> -  /* Finally, mark the hash table as having a valid hash order.
> -     Do this last so that if we're interrupted, we retry on next
> -     access. */
> -  eassert (h->count < 0);
> -  h->count = -h->count;
> -  eassert (!hash_rehash_needed_p (h));
> +    {
> +      EMACS_UINT hash_code = XUFIXNUM (HASH_HASH (h, i));
> +      ptrdiff_t start_of_bucket = hash_code % ASIZE (h->index);
> +      set_hash_next_slot (h, i, HASH_INDEX (h, start_of_bucket));
> +      set_hash_index_slot (h, start_of_bucket, i);
> +      eassert (HASH_NEXT (h, i) != i); /* Stop loops.  */
> +    }
>  }
>  
>  /* Lookup KEY in hash table H.  If HASH is non-null, return in *HASH
> @@ -4290,8 +4271,6 @@ hash_lookup (struct Lisp_Hash_Table *h, Lisp_Object key, EMACS_UINT *hash)
>    EMACS_UINT hash_code;
>    ptrdiff_t start_of_bucket, i;
>  
> -  hash_rehash_if_needed (h);
> -
>    hash_code = h->test.hashfn (&h->test, key);
>    eassert ((hash_code & ~INTMASK) == 0);
>    if (hash)
> @@ -4320,8 +4299,6 @@ hash_put (struct Lisp_Hash_Table *h, Lisp_Object key, Lisp_Object value,
>  {
>    ptrdiff_t start_of_bucket, i;
>  
> -  hash_rehash_if_needed (h);
> -
>    eassert ((hash & ~INTMASK) == 0);
>  
>    /* Increment count after resizing because resizing may fail.  */
> @@ -4355,8 +4332,6 @@ hash_remove_from_table (struct Lisp_Hash_Table *h, Lisp_Object key)
>    ptrdiff_t start_of_bucket = hash_code % ASIZE (h->index);
>    ptrdiff_t prev = -1;
>  
> -  hash_rehash_if_needed (h);
> -
>    for (ptrdiff_t i = HASH_INDEX (h, start_of_bucket);
>         0 <= i;
>         i = HASH_NEXT (h, i))
> @@ -4434,9 +4409,7 @@ sweep_weak_table (struct Lisp_Hash_Table *h, bool remove_entries_p)
>    for (ptrdiff_t bucket = 0; bucket < n; ++bucket)
>      {
>        /* Follow collision chain, removing entries that don't survive
> -         this garbage collection.  It's okay if hash_rehash_needed_p
> -         (h) is true, since we're operating entirely on the cached
> -         hash values. */
> +         this garbage collection.  */
>        ptrdiff_t prev = -1;
>        ptrdiff_t next;
>        for (ptrdiff_t i = HASH_INDEX (h, bucket); 0 <= i; i = next)
> @@ -4881,7 +4854,6 @@ DEFUN ("hash-table-count", Fhash_table_count, Shash_table_count, 1, 1, 0,
>    (Lisp_Object table)
>  {
>    struct Lisp_Hash_Table *h = check_hash_table (table);
> -  hash_rehash_if_needed (h);
>    return make_fixnum (h->count);
>  }
>  
> diff --git a/src/lisp.h b/src/lisp.h
> index 13014c82dc..d0e5c43c41 100644
> --- a/src/lisp.h
> +++ b/src/lisp.h
> @@ -2245,11 +2245,7 @@ #define DEFSYM(sym, name) /* empty */
>  
>  struct Lisp_Hash_Table
>  {
> -  /* Change pdumper.c if you change the fields here.
> -
> -     IMPORTANT!!!!!!!
> -
> -     Call hash_rehash_if_needed() before accessing.  */
> +  /* Change pdumper.c if you change the fields here.  */
>  
>    /* This is for Lisp; the hash table code does not refer to it.  */
>    union vectorlike_header header;
> @@ -2363,19 +2359,6 @@ HASH_TABLE_SIZE (const struct Lisp_Hash_Table *h)
>  
>  void hash_table_rehash (struct Lisp_Hash_Table *h);
>  
> -INLINE bool
> -hash_rehash_needed_p (const struct Lisp_Hash_Table *h)
> -{
> -  return h->count < 0;
> -}
> -
> -INLINE void
> -hash_rehash_if_needed (struct Lisp_Hash_Table *h)
> -{
> -  if (hash_rehash_needed_p (h))
> -    hash_table_rehash (h);
> -}
> -
>  /* Default size for hash tables if not specified.  */
>  
>  enum DEFAULT_HASH_SIZE { DEFAULT_HASH_SIZE = 65 };
> diff --git a/src/minibuf.c b/src/minibuf.c
> index d9a6e15b05..e923ce2a43 100644
> --- a/src/minibuf.c
> +++ b/src/minibuf.c
> @@ -1203,9 +1203,6 @@ DEFUN ("try-completion", Ftry_completion, Stry_completion, 2, 3, 0,
>        bucket = AREF (collection, idx);
>      }
>  
> -  if (HASH_TABLE_P (collection))
> -    hash_rehash_if_needed (XHASH_TABLE (collection));
> -
>    while (1)
>      {
>        /* Get the next element of the alist, obarray, or hash-table.  */
> diff --git a/src/pdumper.c b/src/pdumper.c
> index 03c00bf27b..d35d296d32 100644
> --- a/src/pdumper.c
> +++ b/src/pdumper.c
> @@ -107,17 +107,6 @@ #define VM_MS_WINDOWS 2
>  
>  #define DANGEROUS 0
>  
> -/* PDUMPER_CHECK_REHASHING being true causes the portable dumper to
> -   check, for each hash table it dumps, that the hash table means the
> -   same thing after rehashing.  */
> -#ifndef PDUMPER_CHECK_REHASHING
> -# if ENABLE_CHECKING
> -#  define PDUMPER_CHECK_REHASHING 1
> -# else
> -#  define PDUMPER_CHECK_REHASHING 0
> -# endif
> -#endif
> -
>  /* We require an architecture in which all pointers are the same size
>     and have the same layout, where pointers are either 32 or 64 bits
>     long, and where bytes have eight bits --- that is, a
> @@ -393,6 +382,8 @@ dump_fingerprint (char const *label,
>       The start of the cold region is always aligned on a page
>       boundary.  */
>    dump_off cold_start;
> +
> +  dump_off hash_list;
>  };
>  
>  /* Double-ended singly linked list.  */
> @@ -550,6 +541,8 @@ dump_fingerprint (char const *label,
>       heap objects.  */
>    Lisp_Object bignum_data;
>  
> +  Lisp_Object hash_tables;
> +
>    unsigned number_hot_relocations;
>    unsigned number_discardable_relocations;
>  };
> @@ -2622,68 +2615,64 @@ dump_vectorlike_generic (struct dump_context *ctx,
>    return offset;
>  }
>  
> -/* Determine whether the hash table's hash order is stable
> -   across dump and load.  If it is, we don't have to trigger
> -   a rehash on access.  */
> -static bool
> -dump_hash_table_stable_p (const struct Lisp_Hash_Table *hash)
> +/* Return a list of (KEY . VALUE) pairs in the given hash table.  */
> +static Lisp_Object
> +hash_table_contents (struct Lisp_Hash_Table *h)
>  {
> -  bool is_eql = hash->test.hashfn == hashfn_eql;
> -  bool is_equal = hash->test.hashfn == hashfn_equal;
> -  ptrdiff_t size = HASH_TABLE_SIZE (hash);
> -  for (ptrdiff_t i = 0; i < size; ++i)
> -    if (!NILP (HASH_HASH (hash, i)))
> +  Lisp_Object contents = Qnil;
> +  /* Make sure key_and_value ends up in the same order, charset.c
> +     relies on it by expecting hash table indices to stay constant
> +     across the dump.  */
> +  for (ptrdiff_t i = HASH_TABLE_SIZE (h) - 1; i >= 0; --i)
> +    if (!NILP (HASH_HASH (h, i)))
>        {
> -        Lisp_Object key =  HASH_KEY (hash, i);
> -	bool key_stable = (dump_builtin_symbol_p (key)
> -			   || FIXNUMP (key)
> -			   || (is_equal && STRINGP (key))
> -			   || ((is_equal || is_eql) && FLOATP (key)));
> -        if (!key_stable)
> -          return false;
> +	dump_push (&contents, HASH_VALUE (h, i));
> +	dump_push (&contents, HASH_KEY (h, i));
>        }
> +  return CALLN (Fapply, Qvector, contents);
> +}
>  
> -  return true;
> +static dump_off
> +dump_hash_table_list (struct dump_context *ctx)
> +{
> +  if (CONSP (ctx->hash_tables))
> +    return dump_object (ctx, CALLN (Fapply, Qvector, ctx->hash_tables));
> +  else
> +    return 0;
>  }
>  
> -/* Return a list of (KEY . VALUE) pairs in the given hash table.  */
> -static Lisp_Object
> -hash_table_contents (Lisp_Object table)
> +static void
> +hash_table_freeze (struct Lisp_Hash_Table *h)
>  {
> -  Lisp_Object contents = Qnil;
> -  struct Lisp_Hash_Table *h = XHASH_TABLE (table);
> -  for (ptrdiff_t i = 0; i < HASH_TABLE_SIZE (h); ++i)
> -    if (!NILP (HASH_HASH (h, i)))
> -      dump_push (&contents, Fcons (HASH_KEY (h, i), HASH_VALUE (h, i)));
> -  return Fnreverse (contents);
> +  h->key_and_value = hash_table_contents (h);
> +  ptrdiff_t nkeys = XFIXNAT (Flength (h->key_and_value)) / 2;
> +  h->count = nkeys;
> +  if (nkeys == 0)
> +    nkeys = 1;
> +  h->index = h->next = h->hash = make_fixnum (nkeys);
>  }
>  
> -/* Copy the given hash table, rehash it, and make sure that we can
> -   look up all the values in the original.  */
>  static void
> -check_hash_table_rehash (Lisp_Object table_orig)
> -{
> -  hash_rehash_if_needed (XHASH_TABLE (table_orig));
> -  Lisp_Object table_rehashed = Fcopy_hash_table (table_orig);
> -  eassert (XHASH_TABLE (table_rehashed)->count >= 0);
> -  XHASH_TABLE (table_rehashed)->count *= -1;
> -  eassert (XHASH_TABLE (table_rehashed)->count <= 0);
> -  hash_rehash_if_needed (XHASH_TABLE (table_rehashed));
> -  eassert (XHASH_TABLE (table_rehashed)->count >= 0);
> -  Lisp_Object expected_contents = hash_table_contents (table_orig);
> -  while (!NILP (expected_contents))
> +hash_table_thaw (struct Lisp_Hash_Table *h)
> +{
> +  Lisp_Object count = h->index;
> +  h->index = Fmake_vector (h->index, make_fixnum (-1));
> +  h->hash = Fmake_vector (h->hash, Qnil);
> +  h->next = Fmake_vector (h->next, make_fixnum (-1));
> +  Lisp_Object key_and_value = h->key_and_value;
> +  h->next_free = -1;
> +  if (XFIXNAT (count) <= 1)
>      {
> -      Lisp_Object key_value_pair = dump_pop (&expected_contents);
> -      Lisp_Object key = XCAR (key_value_pair);
> -      Lisp_Object expected_value = XCDR (key_value_pair);
> -      Lisp_Object arbitrary = Qdump_emacs_portable__sort_predicate_copied;
> -      Lisp_Object found_value = Fgethash (key, table_rehashed, arbitrary);
> -      eassert (EQ (expected_value, found_value));
> -      Fremhash (key, table_rehashed);
> +      h->key_and_value = Fmake_vector (make_fixnum (2 * XFIXNAT (count)), Qnil);
> +      ptrdiff_t i = 0;
> +      while (i < ASIZE (key_and_value))
> +	{
> +	  ASET (h->key_and_value, i, AREF (key_and_value, i));
> +	  i++;
> +	}
>      }
>  
> -  eassert (EQ (Fhash_table_count (table_rehashed),
> -               make_fixnum (0)));
> +  hash_table_rehash (h);
>  }
>  
>  static dump_off
> @@ -2695,45 +2684,11 @@ dump_hash_table (struct dump_context *ctx,
>  # error "Lisp_Hash_Table changed. See CHECK_STRUCTS comment in config.h."
>  #endif
>    const struct Lisp_Hash_Table *hash_in = XHASH_TABLE (object);
> -  bool is_stable = dump_hash_table_stable_p (hash_in);
> -  /* If the hash table is likely to be modified in memory (either
> -     because we need to rehash, and thus toggle hash->count, or
> -     because we need to assemble a list of weak tables) punt the hash
> -     table to the end of the dump, where we can lump all such hash
> -     tables together.  */
> -  if (!(is_stable || !NILP (hash_in->weak))
> -      && ctx->flags.defer_hash_tables)
> -    {
> -      if (offset != DUMP_OBJECT_ON_HASH_TABLE_QUEUE)
> -        {
> -	  eassert (offset == DUMP_OBJECT_ON_NORMAL_QUEUE
> -		   || offset == DUMP_OBJECT_NOT_SEEN);
> -          /* We still want to dump the actual keys and values now.  */
> -          dump_enqueue_object (ctx, hash_in->key_and_value, WEIGHT_NONE);
> -          /* We'll get to the rest later.  */
> -          offset = DUMP_OBJECT_ON_HASH_TABLE_QUEUE;
> -          dump_remember_object (ctx, object, offset);
> -          dump_push (&ctx->deferred_hash_tables, object);
> -        }
> -      return offset;
> -    }
> -
> -  if (PDUMPER_CHECK_REHASHING)
> -    check_hash_table_rehash (make_lisp_ptr ((void *) hash_in, Lisp_Vectorlike));
> -
>    struct Lisp_Hash_Table hash_munged = *hash_in;
>    struct Lisp_Hash_Table *hash = &hash_munged;
>  
> -  /* Remember to rehash this hash table on first access.  After a
> -     dump reload, the hash table values will have changed, so we'll
> -     need to rebuild the index.
> -
> -     TODO: for EQ and EQL hash tables, it should be possible to rehash
> -     here using the preferred load address of the dump, eliminating
> -     the need to rehash-on-access if we can load the dump where we
> -     want.  */
> -  if (hash->count > 0 && !is_stable)
> -    hash->count = -hash->count;
> +  hash_table_freeze (hash);
> +  dump_push (&ctx->hash_tables, object);
>  
>    START_DUMP_PVEC (ctx, &hash->header, struct Lisp_Hash_Table, out);
>    dump_pseudovector_lisp_fields (ctx, &out->header, &hash->header);
> @@ -4140,6 +4095,19 @@ DEFUN ("dump-emacs-portable",
>  	 || !NILP (ctx->deferred_hash_tables)
>  	 || !NILP (ctx->deferred_symbols));
>  
> +  ctx->header.hash_list = ctx->offset;
> +  dump_hash_table_list (ctx);
> +
> +  do
> +    {
> +      dump_drain_deferred_hash_tables (ctx);
> +      dump_drain_deferred_symbols (ctx);
> +      dump_drain_normal_queue (ctx);
> +    }
> +  while (!dump_queue_empty_p (&ctx->dump_queue)
> +	 || !NILP (ctx->deferred_hash_tables)
> +	 || !NILP (ctx->deferred_symbols));
> +
>    dump_sort_copied_objects (ctx);
>  
>    /* While we copy built-in symbols into the Emacs image, these
> @@ -5431,6 +5399,13 @@ pdumper_load (const char *dump_filename)
>    for (int i = 0; i < ARRAYELTS (sections); ++i)
>      dump_mmap_reset (&sections[i]);
>  
> +  if (header->hash_list)
> +    {
> +      struct Lisp_Vector *hash_tables =
> +	((struct Lisp_Vector *)(dump_base + header->hash_list));
> +      XSETVECTOR (Vpdumper_hash_tables, hash_tables);
> +    }
> +
>    /* Run the functions Emacs registered for doing post-dump-load
>       initialization.  */
>    for (int i = 0; i < nr_dump_hooks; ++i)
> @@ -5502,10 +5477,31 @@ DEFUN ("pdumper-stats", Fpdumper_stats, Spdumper_stats, 0, 0, 0,
>  
>  
>  
> +static void thaw_hash_tables (void)
> +{
> +  Lisp_Object hash_tables = Vpdumper_hash_tables;
> +  ptrdiff_t i = 0;
> +  while (i < ASIZE (hash_tables))
> +    {
> +      hash_table_thaw (XHASH_TABLE (AREF (hash_tables, i)));
> +      i++;
> +    }
> +  Vpdumper_hash_tables = zero_vector;
> +}
> +
> +void
> +init_pdumper_once (void)
> +{
> +  Vpdumper_hash_tables = zero_vector;
> +  pdumper_do_now_and_after_load (thaw_hash_tables);
> +}
> +
>  void
>  syms_of_pdumper (void)
>  {
>  #ifdef HAVE_PDUMPER
> +  DEFVAR_LISP ("pdumper-hash-tables", Vpdumper_hash_tables,
> +	       doc: /* A list of hash tables that need to be thawed after loading the pdump.  */);
>    defsubr (&Sdump_emacs_portable);
>    defsubr (&Sdump_emacs_portable__sort_predicate);
>    defsubr (&Sdump_emacs_portable__sort_predicate_copied);
> diff --git a/src/pdumper.h b/src/pdumper.h
> index ab2f426c1e..cfea06d33d 100644
> --- a/src/pdumper.h
> +++ b/src/pdumper.h
> @@ -248,6 +248,7 @@ pdumper_clear_marks (void)
>     file was loaded.  */
>  extern void pdumper_record_wd (const char *);
>  
> +void init_pdumper_once (void);
>  void syms_of_pdumper (void);
>  
>  INLINE_HEADER_END
> -- 
> 2.17.1
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Fri, 19 Jul 2019 07:25:01 GMT) Full text and rfc822 format available.

Message #26 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 36597 <at> debbugs.gnu.org, Daniel Colascione <dancol <at> dancol.org>
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Fri, 19 Jul 2019 07:23:50 +0000
[Message part 1 (text/plain, inline)]
On Mon, Jul 15, 2019 at 2:39 PM Pip Cet <pipcet <at> gmail.com> wrote:
> But I noticed that my patch may affect hashes more than it should,
> because it makes the thawed hash have the same size as the number of
> hash entries in it. That seems not to hurt performance...

But it should be fixed, anyway. The attached patch has no unexpected
performance benefits, as far as I can tell.
[0001-Rehash-hash-tables-eagerly-after-loading-a-dump.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Fri, 19 Jul 2019 07:47:02 GMT) Full text and rfc822 format available.

Message #29 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: eggert <at> cs.ucla.edu, 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Fri, 19 Jul 2019 10:46:40 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 19 Jul 2019 07:23:50 +0000
> Cc: 36597 <at> debbugs.gnu.org
> 
> But it should be fixed, anyway. The attached patch has no unexpected
> performance benefits, as far as I can tell.

Thanks.

> +static void thaw_hash_tables (void)
> +{

This is not our style of defining a function.  The name should begin
on a new line.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sat, 20 Jul 2019 12:40:02 GMT) Full text and rfc822 format available.

Message #32 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sat, 20 Jul 2019 12:38:20 +0000
[Message part 1 (text/plain, inline)]
On Fri, Jul 19, 2019 at 7:46 AM Eli Zaretskii <eliz <at> gnu.org> wrote:

> > +static void thaw_hash_tables (void)
> > +{
>
> This is not our style of defining a function.  The name should begin
> on a new line.

Thanks for pointing that out, revised patch attached.

I'm currently playing around with redefining hash tables not to have
internal freelists. That makes the hash table code a lot simpler
overall, but some of that simplicity would be lost trying to support
lazy hash table rehashing.
[0001-Rehash-hash-tables-eagerly-after-loading-a-dump.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 21 Jul 2019 03:19:01 GMT) Full text and rfc822 format available.

Message #35 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Cc: 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sat, 20 Jul 2019 20:18:04 -0700
[Message part 1 (text/plain, inline)]
Pip Cet wrote:
> I'm currently playing around with redefining hash tables not to have
> internal freelists. That makes the hash table code a lot simpler
> overall, but some of that simplicity would be lost trying to support
> lazy hash table rehashing.

While looking into this I discovered unlikely bugs in Emacs's hash table code 
and GC that can make Emacs dump core, along with some other unlikely hash-table 
bugs that can cause Emacs to report memory exhaustion when there should be 
plenty of memory. I installed the attached patches to fix these problems and to 
refactor to make this code easier to understand (at least for me :-). These 
patches will probably affect performance analysis.
[0001-Fix-hash-table-overallocation-etc.patch (text/x-patch, attachment)]
[0002-Rename-pure-to-purecopy.patch (text/x-patch, attachment)]
[0003-Simplify-maybe_gc-implementation.patch (text/x-patch, attachment)]
[0004-Inhibit-GC-after-inhibit_garbage_collection.patch (text/x-patch, attachment)]
[0005-Simplify-hashfn-cmpfn-calling-convention.patch (text/x-patch, attachment)]
[0006-Fix-crash-if-user-test-munges-hash-table.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 21 Jul 2019 05:36:02 GMT) Full text and rfc822 format available.

Message #38 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sun, 21 Jul 2019 05:34:50 +0000
On Sun, Jul 21, 2019 at 3:18 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Pip Cet wrote:
> > I'm currently playing around with redefining hash tables not to have
> > internal freelists. That makes the hash table code a lot simpler
> > overall, but some of that simplicity would be lost trying to support
> > lazy hash table rehashing.
>
> While looking into this I discovered unlikely bugs in Emacs's hash table code
> and GC that can make Emacs dump core, along with some other unlikely hash-table
> bugs that can cause Emacs to report memory exhaustion when there should be
> plenty of memory. I installed the attached patches to fix these problems and to
> refactor to make this code easier to understand (at least for me :-). These
> patches will probably affect performance analysis.

Well, at least they'll require rebasing, particularly of the
no-internal-freelists patch :-)

While your changes are extensive, I don't see anything in there that
would drastically affect performance or memory footprint. Maybe I'm
missing something, though.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 21 Jul 2019 06:33:02 GMT) Full text and rfc822 format available.

Message #41 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sat, 20 Jul 2019 23:32:03 -0700
Pip Cet wrote:
> I don't see anything in there that
> would drastically affect performance or memory footprint.

The drastic change to the memory footprint occurs because the old code 
incorrectly computed the length of the hash table's vectors, and over-allocated 
them in some cases. The over-allocation factor could get worse with each hash 
table resize, following a Fibonacci-like sequence. I think this was a bug I 
introduced in 2011-07-21T17:41:20!eggert <at> cs.ucla.edu.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 21 Jul 2019 06:34:01 GMT) Full text and rfc822 format available.

Message #44 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sun, 21 Jul 2019 06:32:26 +0000
[Message part 1 (text/plain, inline)]
On Sun, Jul 21, 2019 at 5:34 AM Pip Cet <pipcet <at> gmail.com> wrote:
> On Sun, Jul 21, 2019 at 3:18 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> > Pip Cet wrote:
> > > I'm currently playing around with redefining hash tables not to have
> > > internal freelists. That makes the hash table code a lot simpler
> > > overall, but some of that simplicity would be lost trying to support
> > > lazy hash table rehashing.
> >
> > While looking into this I discovered unlikely bugs in Emacs's hash table code
> > and GC that can make Emacs dump core, along with some other unlikely hash-table
> > bugs that can cause Emacs to report memory exhaustion when there should be
> > plenty of memory. I installed the attached patches to fix these problems and to
> > refactor to make this code easier to understand (at least for me :-). These
> > patches will probably affect performance analysis.
>
> Well, at least they'll require rebasing, particularly of the
> no-internal-freelists patch :-)

Rebased patches attached. The performance measurements don't seem to
change significantly.

> While your changes are extensive, I don't see anything in there that
> would drastically affect performance or memory footprint. Maybe I'm
> missing something, though.
[0001-Rehash-hash-tables-eagerly-after-loading-a-dump.patch (text/x-patch, attachment)]
[0001-snapshot.patch.gz (application/gzip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Sun, 09 Aug 2020 19:29:02 GMT) Full text and rfc822 format available.

Message #47 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Sun, 09 Aug 2020 21:27:58 +0200
Pip Cet <pipcet <at> gmail.com> writes:

>> Well, at least they'll require rebasing, particularly of the
>> no-internal-freelists patch :-)
>
> Rebased patches attached. The performance measurements don't seem to
> change significantly.

And this was the final message in this thread.  I think the general
consensus was that Pip's patches were a good idea...  unless they had
any negative performance impact?

So I tried applying the patch now to Emacs 28 to do some benchmarking,
but it didn't apply cleanly, so I gave up.

Is this still something worth pursuing?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Mon, 10 Aug 2020 11:53:01 GMT) Full text and rfc822 format available.

Message #50 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Mon, 10 Aug 2020 11:51:42 +0000
[Message part 1 (text/plain, inline)]
On Sun, Aug 9, 2020 at 7:28 PM Lars Ingebrigtsen <larsi <at> gnus.org> wrote:
> >> Well, at least they'll require rebasing, particularly of the
> >> no-internal-freelists patch :-)
> >
> > Rebased patches attached. The performance measurements don't seem to
> > change significantly.
>
> And this was the final message in this thread.  I think the general
> consensus was that Pip's patches were a good idea...  unless they had
> any negative performance impact?

I'm not aware of any remaining negative performance impact, but I do
seem to recall Daniel was somewhat opposed to the patch.

> So I tried applying the patch now to Emacs 28 to do some benchmarking,
> but it didn't apply cleanly, so I gave up.

I'll try to rebase them again. Does the attached work for you?

> Is this still something worth pursuing?

I think it is, at least in the case of the "rehash eagerly" patch.

As for the more general rewrite of hash tables, it might be a good
idea to summarize and discuss this idea some more. Here are some
initial notes, but I'll make a proper proposal once things are ready:

- there's a new lisp.h type for hash cells, which is four words long
and holds a hash, key, value, and a next link (either another hash
cell, or the hash table this cell belongs to, or nil if it no longer
belongs to any hash table)
- hash tables are essentially vectors of such hash cells
- hash cells are invisible to Lisp code, but C code can hold on to
hash cell Lisp Objects, guaranteeing they won't be collected (they can
still be removed from the hash table)
- in my current implementation, hash cells aren't pseudovectors,
they're cons-like objects with four cells rather than two.

The implications are:

- more work for the garbage collector
- a somewhat ambitious rewrite of the profiler to keep hash cells pre-allocated
- shrinkable hash tables
- genuinely unordered hash tables, since there's no more HASH_INDEX
- rehash thresholds would be interpreted differently

There's also a slight reduction in memory usage for hash tables.
[0001-Rehash-hash-tables-eagerly-after-loading-a-dump.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Mon, 10 Aug 2020 13:05:02 GMT) Full text and rfc822 format available.

Message #53 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 36597 <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Mon, 10 Aug 2020 15:04:18 +0200
Pip Cet <pipcet <at> gmail.com> writes:

>> So I tried applying the patch now to Emacs 28 to do some benchmarking,
>> but it didn't apply cleanly, so I gave up.
>
> I'll try to rebase them again. Does the attached work for you?

Yup.  I've now done some benchmarking with

time make -j32 compile-always

Without patch:

real	0m38.855s
real	0m40.295s
real	0m39.299s
real	0m39.864s
real	0m40.428s
real	0m40.012s
real	0m38.988s
real	0m39.807s
real	0m40.455s
real	0m37.341s
real	0m33.349s
real	0m34.379s
real	0m34.339s
real	0m33.139s
real	0m32.902s
real	0m33.755s
real	0m34.143s
real	0m34.598s
real	0m34.484s
real	0m34.342s


With patch:

real	0m36.064s
real	0m36.617s
real	0m34.502s
real	0m36.817s
real	0m31.782s
real	0m32.859s
real	0m29.779s
real	0m29.703s
real	0m30.313s
real	0m29.496s
real	0m29.585s
real	0m29.807s
real	0m30.235s
real	0m30.142s
real	0m29.960s
real	0m30.067s
real	0m30.114s
real	0m29.975s
real	0m30.388s
real	0m30.112s

Er...  It's weird that there's so much difference in time between
runs -- this is running on a machine that does nothing else and has a
load of 0.0 if I'm not compiling Emacs.

So I don't know what can be concluded here...  if we just take the mean
from these numbers, it seems that your patch is making compilation
faster.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 11 Aug 2020 09:34:02 GMT) Full text and rfc822 format available.

Notification sent to Pip Cet <pipcet <at> gmail.com>:
bug acknowledged by developer. (Tue, 11 Aug 2020 09:34:02 GMT) Full text and rfc822 format available.

Message #58 received at 36597-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Lars Ingebrigtsen <larsi <at> gnus.org>, Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 36597-done <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 02:33:34 -0700
[Message part 1 (text/plain, inline)]
On 8/10/20 6:04 AM, Lars Ingebrigtsen wrote:

> time make -j32 compile-always
> ...
> Er...  It's weird that there's so much difference in time between
> runs

I  think I get less variance if I do a sequential 'make' (i.e., without -j). Of 
course this takes longer.

> if we just take the mean
> from these numbers, it seems that your patch is making compilation
> faster.  :-)

It also simplifies the code a bit, so I took the liberty of installing it, after 
updating its commit message a bit and changing it to keep a comment that was 
updated recently (I figure this was a merge error). The patch I installed is the 
first patch attached. While reviewing the patch I noticed some relatively minor 
things in the neighborhood that could easily be fixed, so I did so by installing 
some followup patches, also attached (the last patch is the only one that's at 
all nontrivial). I'll mark the bug as done. Thanks to both of you for following 
up on this.
[0001-Rehash-hash-tables-eagerly-after-loading-a-dump.patch (text/x-patch, attachment)]
[0002-src-fns.c-hash_table_rehash-Help-the-compiler-a-bit.patch (text/x-patch, attachment)]
[0003-src-pdumper.c-pdumper_load-XSETVECTOR-make_lisp_ptr.patch (text/x-patch, attachment)]
[0004-Don-t-needlessly-convert-to-unsigned-in-pdumper.patch (text/x-patch, attachment)]
[0005-In-pdumper-simplify-INT_MAX-computation.patch (text/x-patch, attachment)]
[0006-pdumper-speed-tweeks-for-hash-tables.patch (text/x-patch, attachment)]
[0007-pdumper-avoid-listing-hash-table-contents.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 09:42:01 GMT) Full text and rfc822 format available.

Message #61 received at 36597-done <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, Eli Zaretskii <eliz <at> gnu.org>,
 36597-done <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 09:40:27 +0000
On Tue, Aug 11, 2020 at 9:33 AM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> On 8/10/20 6:04 AM, Lars Ingebrigtsen wrote:
> > time make -j32 compile-always
> > ...
> > Er...  It's weird that there's so much difference in time between
> > runs
>
> I  think I get less variance if I do a sequential 'make' (i.e., without -j). Of
> course this takes longer.

It's worse than just execution time variance: at least in the past,
byte code used to differ between runs if -jN was used. I'm not sure
whether that's still true.

> It also simplifies the code a bit, so I took the liberty of installing it, after
> updating its commit message a bit and changing it to keep a comment that was
> updated recently (I figure this was a merge error).

It was, thank you!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 11:51:01 GMT) Full text and rfc822 format available.

Message #64 received at 36597-done <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 36597-done <at> debbugs.gnu.org,
 Pip Cet <pipcet <at> gmail.com>
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 13:50:29 +0200
Paul Eggert <eggert <at> cs.ucla.edu> writes:

> It also simplifies the code a bit, so I took the liberty of installing
> it, after updating its commit message a bit and changing it to keep a
> comment that was updated recently (I figure this was a merge
> error).

Just for fun, I re-ran the "make compile-always" test 20 times
before/after pulling the new version, and the mean compilation speed is
slightly smaller after.  But as before, the speeds vary ~10% between
runs.  But at least there's no obvious negative performance impact here.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 14:54:02 GMT) Full text and rfc822 format available.

Message #67 received at 36597-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: larsi <at> gnus.org, 36597-done <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 17:52:44 +0300
> Cc: Eli Zaretskii <eliz <at> gnu.org>, 36597-done <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 11 Aug 2020 02:33:34 -0700
> 
> It also simplifies the code a bit, so I took the liberty of installing it

It doesn't compile here:

  pdumper.c: In function 'dump_queue_enqueue':
  pdumper.c:1012:19: warning: unknown conversion type character 'l' in format [-Wformat=]
   1012 |       dump_trace ("new object %0*"pI"x weight=%d\n", EMACS_INT_XDIGITS, uobj,
	|                   ^~~~~~~~~~~~~~~~
  In file included from character.h:27,
		   from buffer.h:27,
		   from pdumper.c:34:
  lisp.h:108:17: note: format string is defined here
    108 | #   define pI "ll"
	|                 ^
  pdumper.c:1012:19: warning: format '%d' expects argument of type 'int', but argument 3 has type 'EMACS_UINT' {aka 'long long unsigned int'} [-Wformat=]
   1012 |       dump_trace ("new object %0*"pI"x weight=%d\n", EMACS_INT_XDIGITS, uobj,
	|                   ^~~~~~~~~~~~~~~~                                      ~~~~
	|                                                                         |
	|
  EMACS_UINT {aka long long unsigned int}
  pdumper.c:1012:48: note: format string is defined here
   1012 |       dump_trace ("new object %0*"pI"x weight=%d\n", EMACS_INT_XDIGITS, uobj,
	|                                               ~^
	|                                                |
	|                                                int
	|                                               %I64d
  pdumper.c:1012:19: warning: too many arguments for format [-Wformat-extra-args]
   1012 |       dump_trace ("new object %0*"pI"x weight=%d\n", EMACS_INT_XDIGITS, uobj,
	|                   ^~~~~~~~~~~~~~~~
  pdumper.c: In function 'dump_queue_dequeue':
  pdumper.c:1229:6: warning: format '%d' expects argument of type 'int', but argument 3 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]
   1229 |     (("dump_queue_dequeue basis=%"PRIdDUMP_OFF" fancy=%"PRIdPTR
	|      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1230 |       " zero=%"PRIdPTR" normal=%"PRIdPTR" strong=%"PRIdPTR" hash=%td\n"),
	|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   1231 |      basis,
   1232 |      dump_tailq_length (&dump_queue->fancy_weight_objects),
	|      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	|      |
	|      gl_intptr_t {aka long int}
  pdumper.c:1229:6: warning: format '%d' expects argument of type 'int', but argument 4 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]
  pdumper.c:1229:6: warning: format '%d' expects argument of type 'int', but argument 5 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]
  pdumper.c:1229:6: warning: format '%d' expects argument of type 'int', but argument 6 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]
  pdumper.c:1229:6: warning: unknown conversion type character 't' in format [-Wformat=]
  pdumper.c:1229:6: warning: too many arguments for format [-Wformat-extra-args]
  pdumper.c:1310:15: warning: unknown conversion type character 'l' in format [-Wformat=]
   1310 |   dump_trace ("  result score=%f src=%s object=%0*"pI"x\n",
	|               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from character.h:27,
		   from buffer.h:27,
		   from pdumper.c:34:
  lisp.h:108:17: note: format string is defined here
    108 | #   define pI "ll"
	|                 ^
  pdumper.c:1310:15: warning: too many arguments for format [-Wformat-extra-args]
   1310 |   dump_trace ("  result score=%f src=%s object=%0*"pI"x\n",
	|               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  pdumper.c: In function 'hash_table_thaw':
  pdumper.c:2667:30: error: conversion from 'EMACS_INT' {aka 'long long int'} to 'ptrdiff_t' {aka 'int'} may change value [-Werror=conversion]
   2667 |   h->hash = make_nil_vector (XFIXNUM (h->hash));
	|                              ^~~~~~~~~~~~~~~~~
  cc1.exe: some warnings being treated as errors
  Makefile:401: recipe for target `pdumper.o' failed
  make[1]: *** [pdumper.o] Error 1




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 15:31:01 GMT) Full text and rfc822 format available.

Message #70 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 08:30:23 -0700
On 8/11/20 7:52 AM, Eli Zaretskii wrote:

> It doesn't compile here:
> 
>   pdumper.c: In function 'dump_queue_enqueue':
>   pdumper.c:1012:19: warning: unknown conversion type character 'l' in format [-Wformat=]
>    1012 |       dump_trace ("new object %0*"pI"x weight=%d\n", EMACS_INT_XDIGITS, uobj,
> 	|                   ^~~~~~~~~~~~~~~~
>   In file included from character.h:27,
> 		   from buffer.h:27,
> 		   from pdumper.c:34:
>   lisp.h:108:17: note: format string is defined here
>     108 | #   define pI "ll"

<https://stackoverflow.com/questions/23718110/error-unknown-conversion-type-character-l-in-format-scanning-long-long> 
suggests that this is a problem on MinGW, but pI is supposed to be "I64" on that 
platform, not "ll".

What warnings does your compiler generate for the following?

#include <stdio.h>
int a;
long long b;
int main (void) {
  printf ("x=%0*llx\n", a, b);
  printf ("x=%0*I64x\n", a, b);
  return 0;
}

and what are __MINGW32__, __USE_MINGW_ANSI_STDIO, MINGW_W64, 
__MINGW32_MAJOR_VERSION, __GNUC__, and __GNUC_MINOR__ on your platform?

On my Fedora 31 platform, the above program causes 'gcc -Wall' to say:

t.c: In function ‘main’:
t.c:6:17: warning: unknown conversion type character ‘I’ in format [-Wformat=]
    6 |   printf ("x=%0*I64x\n", a, b);
      |                 ^
t.c:6:11: warning: too many arguments for format [-Wformat-extra-args]
    6 |   printf ("x=%0*I64x\n", a, b);
      |           ^~~~~~~~~~~~~

which is what I'd expect on Fedora.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 16:00:02 GMT) Full text and rfc822 format available.

Message #73 received at 36597-done <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, Paul Eggert <eggert <at> cs.ucla.edu>,
 36597-done <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 15:59:12 +0000
[Message part 1 (text/plain, inline)]
On Tue, Aug 11, 2020 at 2:52 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
>   pdumper.c: In function 'hash_table_thaw':
>   pdumper.c:2667:30: error: conversion from 'EMACS_INT' {aka 'long long int'} to 'ptrdiff_t' {aka 'int'} may change value [-Werror=conversion]
>    2667 |   h->hash = make_nil_vector (XFIXNUM (h->hash));
>         |                              ^~~~~~~~~~~~~~~~~
>   cc1.exe: some warnings being treated as errors

I suggest going back to Fmake_vector (h->hash, Qnil), as in the
attached patch. It's shorter, and it actually compiles.
[0001-Fix-wide-int-compilation-issue-in-pdumper.c.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 17:01:02 GMT) Full text and rfc822 format available.

Message #76 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 20:00:10 +0300
> Cc: larsi <at> gnus.org, pipcet <at> gmail.com, 36597 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 11 Aug 2020 08:30:23 -0700
> 
> >   lisp.h:108:17: note: format string is defined here
> >     108 | #   define pI "ll"
> 
> <https://stackoverflow.com/questions/23718110/error-unknown-conversion-type-character-l-in-format-scanning-long-long> 
> suggests that this is a problem on MinGW, but pI is supposed to be "I64" on that 
> platform, not "ll".

No, it's supposed to be "ll".  The problem is not in lisp.h, it's in
pdumper.c: its declaration of attributes of dump_trace was incorrect
for MinGW.  I fixed that.

The warnings about %d vs gl_intptr_t should be fixed in Gnulib, I
think: why does it use 'long int' instead of 'int' on 32-bit
platforms?  Or maybe the format in pdumper.c should use %ld instead, I
don't know.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 17:02:01 GMT) Full text and rfc822 format available.

Message #79 received at 36597-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: larsi <at> gnus.org, eggert <at> cs.ucla.edu, 36597-done <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 20:00:38 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Tue, 11 Aug 2020 15:59:12 +0000
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>, larsi <at> gnus.org, 36597-done <at> debbugs.gnu.org
> 
> I suggest going back to Fmake_vector (h->hash, Qnil), as in the
> attached patch. It's shorter, and it actually compiles.

Yes, I did that, thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 17:32:01 GMT) Full text and rfc822 format available.

Message #82 received at 36597-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>, Pip Cet <pipcet <at> gmail.com>
Cc: larsi <at> gnus.org, 36597-done <at> debbugs.gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 10:31:24 -0700
[Message part 1 (text/plain, inline)]
On 8/11/20 10:00 AM, Eli Zaretskii wrote:
>> I suggest going back to Fmake_vector (h->hash, Qnil), as in the
>> attached patch. It's shorter, and it actually compiles.
> Yes, I did that, thanks.

The compilation issue was due to pdumper enabling -Wconversion, which causes 
more trouble than it cures (but that is a different topic). I worked around the 
glitch by installing the attached further patch, which should also help explain 
the motivation for make_nil_vector.
[0001-Prefer-make_nil_vector-to-make-vector-with-nil.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 18:12:01 GMT) Full text and rfc822 format available.

Message #85 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 11:11:20 -0700
[Message part 1 (text/plain, inline)]
On 8/11/20 10:00 AM, Eli Zaretskii wrote:
> The warnings about %d vs gl_intptr_t should be fixed in Gnulib, I
> think: why does it use 'long int' instead of 'int' on 32-bit
> platforms?  Or maybe the format in pdumper.c should use %ld instead, I
> don't know.

Ah, it's because Emacs uses C99 inttypes.h macros like PRIdPTR without also 
using the Gnulib inttypes module which implements these macros on platforms like 
MinGW where the macros don't work. This problem occurs elsewhere in Emacs in a 
couple of places, we just never noticed it. I installed the attached patch, 
which I hope fixes the glitch.
[0001-Use-Gnulib-inttypes-module.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 18:28:01 GMT) Full text and rfc822 format available.

Message #88 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 19:27:00 +0100
On Tue 11 Aug 2020, Paul Eggert wrote:

> On 8/11/20 10:00 AM, Eli Zaretskii wrote:
>>> I suggest going back to Fmake_vector (h->hash, Qnil), as in the
>>> attached patch. It's shorter, and it actually compiles.
>> Yes, I did that, thanks.
>
> The compilation issue was due to pdumper enabling -Wconversion, which causes
> more trouble than it cures (but that is a different topic). I worked around
> the glitch by installing the attached further patch, which should also help
> explain the motivation for make_nil_vector.

Eli's fixes have addressed issues with the MinGW toolchains, but Mingw64
64bit builds are still broken:

C:/emacs/git/emacs/master/src/pdumper.c: In function 'dump_read_all':
C:/emacs/git/emacs/master/src/pdumper.c:5078:60: error: conversion from 'size_t' {aka 'long long unsigned int'} to 'unsigned int' may change value [-Werror=conversion]
 5078 |       ssize_t chunk = read (fd, (char *) buf + bytes_read, chunk_to_read);
      |                                                            ^~~~~~~~~~~~~





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 18:33:01 GMT) Full text and rfc822 format available.

Message #91 received at 36597-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: larsi <at> gnus.org, 36597-done <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 21:32:16 +0300
> Cc: larsi <at> gnus.org, 36597-done <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 11 Aug 2020 10:31:24 -0700
> 
> The compilation issue was due to pdumper enabling -Wconversion, which causes 
> more trouble than it cures (but that is a different topic). I worked around the 
> glitch by installing the attached further patch, which should also help explain 
> the motivation for make_nil_vector.

Is it really worth it?  The code is now a kind of puzzle (what doe
those ALLOW/DISALLOW_IMPLICIT_CONVERSION macros do?), and Fmake_vector
is a very thin wrapper around the C function it calls.  So I think
this change is for the worse, sorry.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 18:37:02 GMT) Full text and rfc822 format available.

Message #94 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 21:35:52 +0300
> Cc: larsi <at> gnus.org, pipcet <at> gmail.com, 36597 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 11 Aug 2020 11:11:20 -0700
> 
> Ah, it's because Emacs uses C99 inttypes.h macros like PRIdPTR without also 
> using the Gnulib inttypes module which implements these macros on platforms like 
> MinGW where the macros don't work. This problem occurs elsewhere in Emacs in a 
> couple of places, we just never noticed it. I installed the attached patch, 
> which I hope fixes the glitch.

It doesn't, because we avoid the Gnulib inttypes module on MinGW.

I don't understand why it's needed; there's nothing wrong with MinGW's
inttypes.h header.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 18:56:02 GMT) Full text and rfc822 format available.

Message #97 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: eggert <at> cs.ucla.edu
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 21:55:03 +0300
> Date: Tue, 11 Aug 2020 21:35:52 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
> 
> I don't understand why it's needed; there's nothing wrong with MinGW's
> inttypes.h header.

And, btw, Gnulib's inttypes.h does this:

  #if !defined PRIdPTR
  # ifdef INTPTR_MAX
  #  define PRIdPTR @PRIPTR_PREFIX@ "d"
  # endif
  #endif

But since MinGW's inttypes.h does provide PRIdPTR, this will do
nothing, so it cannot possibly help here.  Am I missing something?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Tue, 11 Aug 2020 23:44:02 GMT) Full text and rfc822 format available.

Message #100 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Tue, 11 Aug 2020 16:43:16 -0700
[Message part 1 (text/plain, inline)]
On 8/11/20 11:35 AM, Eli Zaretskii wrote:
> It doesn't, because we avoid the Gnulib inttypes module on MinGW.

In that case perhaps I should revert the change that added the Gnulib inttypes 
module, as MS-Windows is the only currently-active platform with PRIdPTR etc. 
problems that I've heard of.

> I don't understand why it's needed; there's nothing wrong with MinGW's
> inttypes.h header.

I don't know what the problems with MS-Windows are or were. Perhaps they're 
fixed on all development environments we know about. That would suggest 
reverting the inttypes change too.

Does the attached simplification pacify GCC on MinGW? If so, that could be 
combined with reverting the inttypes change.

Does the following standalone program compile OK with 'gcc -Wall' on MinGW? If 
so, why does the same thing not work when compiling Emacs? The error message you 
quoted in Bug#36597#67 suggests that PRIdPTR is "d" whereas intptr_t is 'long' 
which means the following program should run afoul of MinGW.

#include <inttypes.h>
#include <stdio.h>
char buf[1000];
intptr_t ip;
int main (void) {
  return sprintf (buf, "%"PRIdPTR, ip);
}
[pdumper.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Wed, 12 Aug 2020 14:11:01 GMT) Full text and rfc822 format available.

Message #103 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Wed, 12 Aug 2020 17:10:34 +0300
> Cc: larsi <at> gnus.org, pipcet <at> gmail.com, 36597 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Tue, 11 Aug 2020 16:43:16 -0700
> 
> [1:text/plain Hide]
> 
> On 8/11/20 11:35 AM, Eli Zaretskii wrote:
> > It doesn't, because we avoid the Gnulib inttypes module on MinGW.
> 
> In that case perhaps I should revert the change that added the Gnulib inttypes 
> module, as MS-Windows is the only currently-active platform with PRIdPTR etc. 
> problems that I've heard of.

If that module is in our repository only because of MS-Windows, then
it indeed isn't needed.

> > I don't understand why it's needed; there's nothing wrong with MinGW's
> > inttypes.h header.
> 
> I don't know what the problems with MS-Windows are or were. Perhaps they're 
> fixed on all development environments we know about. That would suggest 
> reverting the inttypes change too.

I see no problems in MinGW headers with PRIdPTR nor with intptr_t.

> Does the attached simplification pacify GCC on MinGW? If so, that could be 
> combined with reverting the inttypes change.

The warnings disappeared because you installed a change that  no
longer uses the GCC warning options which triggered them.  So I'm
unsure how you'd like me to test the patch, please elaborate.

> Does the following standalone program compile OK with 'gcc -Wall' on MinGW? If 
> so, why does the same thing not work when compiling Emacs? The error message you 
> quoted in Bug#36597#67 suggests that PRIdPTR is "d" whereas intptr_t is 'long' 
> which means the following program should run afoul of MinGW.

The problem is not with MinGW's definition of intptr_t: it is
typedef'ed as 'int' in 32-bit builds in the MinGW headers.  The
problem is not with intptr_t, it's with its Gnulib equivalent:

  pdumper.c:1229:6: warning: format '%d' expects argument of type 'int', but argument 4 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]

It complains about gl_intptr_t, not intptr_t.  That's because Gnulib's
stdint.h does this:

  #  ifdef _WIN64
  typedef long long int gl_intptr_t;
  typedef unsigned long long int gl_uintptr_t;
  #  else
  typedef long int gl_intptr_t;
  typedef unsigned long int gl_uintptr_t;
  #  endif
  #  define intptr_t gl_intptr_t
  #  define uintptr_t gl_uintptr_t

I don't understand why it uses 'long int' 32-bit platforms, it looks
gratuitous, especially since MinGW itself uses just 'int'.  (Another
question is why Gnulib thinks it needs to redefine intptr_t, but if
the redefinition was correct, this would not be especially important.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Wed, 12 Aug 2020 14:47:02 GMT) Full text and rfc822 format available.

Message #106 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: eggert <at> cs.ucla.edu
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Wed, 12 Aug 2020 17:46:17 +0300
> Date: Wed, 12 Aug 2020 17:10:34 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
> 
> > Does the attached simplification pacify GCC on MinGW? If so, that could be 
> > combined with reverting the inttypes change.
> 
> The warnings disappeared because you installed a change that  no
> longer uses the GCC warning options which triggered them.  So I'm
> unsure how you'd like me to test the patch, please elaborate.

Sorry, I wasn't paying attention: the warnings did not disappear.

I've now tried the proposed changes, but they don't affect the code
which is cited in the warning messages.  The offending code is here:

  dump_trace
    (("dump_queue_dequeue basis=%"PRIdDUMP_OFF" fancy=%"PRIdPTR
      " zero=%"PRIdPTR" normal=%"PRIdPTR" strong=%"PRIdPTR" hash=%td\n"),
     basis,
     dump_tailq_length (&dump_queue->fancy_weight_objects),
     dump_tailq_length (&dump_queue->zero_weight_objects),
     dump_tailq_length (&dump_queue->one_weight_normal_objects),
     dump_tailq_length (&dump_queue->one_weight_strong_objects),
     XHASH_TABLE (dump_queue->link_weights)->count);

And it triggers warnings because intptr_t is redefined to be 'long
int', whereas PRIdPTR is "d".  Here are the warnings again:

  pdumper.c: In function 'dump_queue_dequeue':
  pdumper.c:1214:6: warning: format '%d' expects argument of type 'int', but argument 3 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]
   1214 |     (("dump_queue_dequeue basis=%"PRIdDUMP_OFF" fancy=%"PRIdPTR
	|      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1215 |       " zero=%"PRIdPTR" normal=%"PRIdPTR" strong=%"PRIdPTR" hash=%td\n"),
	|       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   1216 |      basis,
   1217 |      dump_tailq_length (&dump_queue->fancy_weight_objects),
	|      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	|      |
	|      gl_intptr_t {aka long int}
  pdumper.c:1214:6: warning: format '%d' expects argument of type 'int', but argument 4 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]
  pdumper.c:1214:6: warning: format '%d' expects argument of type 'int', but argument 5 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]
  pdumper.c:1214:6: warning: format '%d' expects argument of type 'int', but argument 6 has type 'gl_intptr_t' {aka 'long int'} [-Wformat=]




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Wed, 12 Aug 2020 19:12:02 GMT) Full text and rfc822 format available.

Message #109 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Wed, 12 Aug 2020 12:11:09 -0700
On 8/12/20 7:10 AM, Eli Zaretskii wrote:

> If that module is in our repository only because of MS-Windows, then
> it indeed isn't needed.

OK, I removed it.

> I don't understand why it uses 'long int' 32-bit platforms, it looks
> gratuitous, especially since MinGW itself uses just 'int'.  (Another
> question is why Gnulib thinks it needs to redefine intptr_t, but if
> the redefinition was correct, this would not be especially important.)

As I recall the idea was to not worry about the plethora of buggy intptr_t 
implementations at the time, and just substitute Gnulib's own. Nowadays perhaps 
that decision should be revisited.

I looked into the MinGW situation and the problem seems to be that MinGW defined 
a macro _INTPTR_T_DEFINED that it no longer defines, and Gnulib was keying off 
that no-longer-present macro. I installed a patch for that in Gnulib here:

https://lists.gnu.org/r/bug-gnulib/2020-08/msg00088.html

and migrated the patch into Emacs. Hope it fixes things.


As an aside, we're spending too much time on pdumper.c code that has no effect 
because dump_trace never outputs anything. How about if I remove dump_trace and 
its callers? Although dump_trace may have been useful when the portable dumper 
got developed, it's just a developer time sink now.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Wed, 12 Aug 2020 19:29:01 GMT) Full text and rfc822 format available.

Message #112 received at 36597 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Daniel Colascione <dancol <at> dancol.org>
Cc: larsi <at> gnus.org, 36597 <at> debbugs.gnu.org, pipcet <at> gmail.com
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Wed, 12 Aug 2020 22:28:28 +0300
> Cc: larsi <at> gnus.org, pipcet <at> gmail.com, 36597 <at> debbugs.gnu.org
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Date: Wed, 12 Aug 2020 12:11:09 -0700
> 
> I looked into the MinGW situation and the problem seems to be that MinGW defined 
> a macro _INTPTR_T_DEFINED that it no longer defines, and Gnulib was keying off 
> that no-longer-present macro.

I think _INTPTR_T_DEFINED is still being used, but only by MinGW64.  I
use mingw.org's MinGW, where that macro was never used.

However, both MinGW flavors typedef intptr_t as 'int', not 'long int',
on 32-bit platforms.

> I installed a patch for that in Gnulib here:
> 
> https://lists.gnu.org/r/bug-gnulib/2020-08/msg00088.html
> 
> and migrated the patch into Emacs. Hope it fixes things.

It does here, thanks.  I hope someone will be able to make sure
MinGW64 builds are not adversely affected (I don't think they should
be).

> As an aside, we're spending too much time on pdumper.c code that has no effect 
> because dump_trace never outputs anything. How about if I remove dump_trace and 
> its callers? Although dump_trace may have been useful when the portable dumper 
> got developed, it's just a developer time sink now.

I have no opinion on this, but I'd like to hear from Daniel (CC'ed)
what he thinks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36597; Package emacs. (Wed, 12 Aug 2020 20:42:02 GMT) Full text and rfc822 format available.

Message #115 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#36597: 27.0.50; rehash hash tables eagerly in pdumper
Date: Wed, 12 Aug 2020 21:41:02 +0100
On Wed 12 Aug 2020, Eli Zaretskii wrote:

>> Cc: larsi <at> gnus.org, pipcet <at> gmail.com, 36597 <at> debbugs.gnu.org
>> From: Paul Eggert <eggert <at> cs.ucla.edu>
>> Date: Wed, 12 Aug 2020 12:11:09 -0700
>> 
>> I looked into the MinGW situation and the problem seems to be that MinGW defined 
>> a macro _INTPTR_T_DEFINED that it no longer defines, and Gnulib was keying off 
>> that no-longer-present macro.
>
> I think _INTPTR_T_DEFINED is still being used, but only by MinGW64.  I
> use mingw.org's MinGW, where that macro was never used.

Yes, MinGW64 still defines this in corecrt.h, and in _cygwin.h (which I
think is to support cross conpiling to cygwin).

> However, both MinGW flavors typedef intptr_t as 'int', not 'long int',
> on 32-bit platforms.

Agreed.

>> I installed a patch for that in Gnulib here:
>> 
>> https://lists.gnu.org/r/bug-gnulib/2020-08/msg00088.html
>> 
>> and migrated the patch into Emacs. Hope it fixes things.
>
> It does here, thanks.  I hope someone will be able to make sure
> MinGW64 builds are not adversely affected (I don't think they should
> be).

On msys2, 32bit mingw64 and 64bit mingw64 builds are both ok (using
commit fd6058b8).

    AndyM





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 10 Sep 2020 11:24:16 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 221 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.