GNU bug report logs - #20109
Incompatible API change in 2.0 series for string port encoding

Previous Next

Package: guile;

Reported by: David Kastrup <dak <at> gnu.org>

Date: Sun, 15 Mar 2015 13:17:01 UTC

Severity: normal

Done: Andy Wingo <wingo <at> pobox.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 20109 in the body.
You can then email your comments to 20109 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Sun, 15 Mar 2015 13:17:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to David Kastrup <dak <at> gnu.org>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Sun, 15 Mar 2015 13:17:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: bug-guile <at> gnu.org
Subject: Incompatible API change in 2.0 series for string port encoding
Date: Sun, 15 Mar 2015 14:15:56 +0100
In 2.0.9, the following patch/code for getting what amounts to a binary
string port worked.

commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
Author: David Kastrup <dak <at> gnu.org>
Date:   Sun Sep 21 18:40:06 2014 +0200

    Source_file::init_port: Keep GUILEv2 from redecoding string input

diff --git a/lily/source-file.cc b/lily/source-file.cc
index 1118b9d..75ed0d9 100644
--- a/lily/source-file.cc
+++ b/lily/source-file.cc
@@ -152,7 +152,11 @@ Source_file::init_port ()
   // we do our own utf8 encoding and verification in the parser, so we
   // use the no-conversion equivalent of latin1
   SCM str = scm_from_latin1_string (c_str ());
-  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
+  scm_dynwind_begin ((scm_t_dynwind_flags)0);
+  // Why doesn't scm_set_port_encoding_x work here?
+  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
+  str_port_ = scm_open_input_string (str);
+  scm_dynwind_end ();
   scm_set_port_filename_x (str_port_, ly_string2scm (name_));
 }
 

In 2.0.11, it doesn't.  This is an incompatible API change within the
"stable" 2.0 series.  Since we are ping-ponging between GUILE and a
native LilyPond interpreter and need to work with file offsets for
keeping them in synch, it isn't an option to have scm_open_input_string
convert to a different encoding.

It also does not make sense from an efficiency point of view since
strings are either encoded as latin-1 or UTF-32, so encoding string
ports as UTF-8 without alternative means that it is _impossible_ to
employ string ports efficiently and without conversion.

-- 
David Kastrup




Information forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Mon, 16 Mar 2015 20:43:02 GMT) Full text and rfc822 format available.

Message #8 received at 20109 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 20109 <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Mon, 16 Mar 2015 16:42:38 -0400
David Kastrup <dak <at> gnu.org> writes:

> In 2.0.9, the following patch/code for getting what amounts to a binary
> string port worked.
>
> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
> Author: David Kastrup <dak <at> gnu.org>
> Date:   Sun Sep 21 18:40:06 2014 +0200
>
>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>
> diff --git a/lily/source-file.cc b/lily/source-file.cc
> index 1118b9d..75ed0d9 100644
> --- a/lily/source-file.cc
> +++ b/lily/source-file.cc
> @@ -152,7 +152,11 @@ Source_file::init_port ()
>    // we do our own utf8 encoding and verification in the parser, so we
>    // use the no-conversion equivalent of latin1
>    SCM str = scm_from_latin1_string (c_str ());
> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
> +  // Why doesn't scm_set_port_encoding_x work here?
> +  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
> +  str_port_ = scm_open_input_string (str);
> +  scm_dynwind_end ();
>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>  }

This hack of giving Guile a buffer containing UTF-8, but claiming that
it is Latin-1, is not good.  It will cause Guile to see non-ASCII
characters as garbage.  However, if you insist on doing this, I would
suggest using a bytevector input port instead, like this: (untested)

  char *buf = c_str ();
  SCM bv = scm_c_make_bytevector (strlen (buf) + 1);
  strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf);
  str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);

       Mark




Information forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Mon, 16 Mar 2015 20:47:02 GMT) Full text and rfc822 format available.

Message #11 received at 20109 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 20109 <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Mon, 16 Mar 2015 16:46:47 -0400
Mark H Weaver <mhw <at> netris.org> writes:

> David Kastrup <dak <at> gnu.org> writes:
>
>> In 2.0.9, the following patch/code for getting what amounts to a binary
>> string port worked.
>>
>> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
>> Author: David Kastrup <dak <at> gnu.org>
>> Date:   Sun Sep 21 18:40:06 2014 +0200
>>
>>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>>
>> diff --git a/lily/source-file.cc b/lily/source-file.cc
>> index 1118b9d..75ed0d9 100644
>> --- a/lily/source-file.cc
>> +++ b/lily/source-file.cc
>> @@ -152,7 +152,11 @@ Source_file::init_port ()
>>    // we do our own utf8 encoding and verification in the parser, so we
>>    // use the no-conversion equivalent of latin1
>>    SCM str = scm_from_latin1_string (c_str ());
>> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
>> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
>> +  // Why doesn't scm_set_port_encoding_x work here?
>> +  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
>> +  str_port_ = scm_open_input_string (str);
>> +  scm_dynwind_end ();
>>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>>  }
>
> This hack of giving Guile a buffer containing UTF-8, but claiming that
> it is Latin-1, is not good.  It will cause Guile to see non-ASCII
> characters as garbage.  However, if you insist on doing this, I would
> suggest using a bytevector input port instead, like this: (untested)
>
>   char *buf = c_str ();
>   SCM bv = scm_c_make_bytevector (strlen (buf) + 1);
>   strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf);
>   str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);

Sorry, the NUL terminator should not be included:

   char *buf = c_str ();
   size_t len = strlen (buf);
   SCM bv = scm_c_make_bytevector (len);
   memcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf, len);
   str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);

      Mark




Information forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Tue, 17 Mar 2015 08:40:05 GMT) Full text and rfc822 format available.

Message #14 received at 20109 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 20109 <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Tue, 17 Mar 2015 09:39:46 +0100
Mark H Weaver <mhw <at> netris.org> writes:

> David Kastrup <dak <at> gnu.org> writes:
>
>> In 2.0.9, the following patch/code for getting what amounts to a binary
>> string port worked.
>>
>> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
>> Author: David Kastrup <dak <at> gnu.org>
>> Date:   Sun Sep 21 18:40:06 2014 +0200
>>
>>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>>
>> diff --git a/lily/source-file.cc b/lily/source-file.cc
>> index 1118b9d..75ed0d9 100644
>> --- a/lily/source-file.cc
>> +++ b/lily/source-file.cc
>> @@ -152,7 +152,11 @@ Source_file::init_port ()
>>    // we do our own utf8 encoding and verification in the parser, so we
>>    // use the no-conversion equivalent of latin1
>>    SCM str = scm_from_latin1_string (c_str ());
>> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
>> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
>> +  // Why doesn't scm_set_port_encoding_x work here?
>> +  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
>> +  str_port_ = scm_open_input_string (str);
>> +  scm_dynwind_end ();
>>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>>  }
>
> This hack of giving Guile a buffer containing UTF-8, but claiming that
> it is Latin-1, is not good.  It will cause Guile to see non-ASCII
> characters as garbage.

For one thing we are talking about an external file here that is mainly
parsed by LilyPond.  LilyPond provides sensible pinpointing of UTF-8
encoding errors, something which GUILE cannot do with its UTF-8
representation since it has no transparent or reproducible
representation of bad bytes.  Emacs uses overlong encodings for 0-127 to
represent badly encoded bytes (which includes any overlong sequences) in
the range 128-255, making 128-255 encode as patterns 0xc0 0x80 to 0xc1
0xbf.  Since this leads to a reproducible encoding, one always has the
information required for resynchronization even in the case of encoding
errors.

For another, synchronization of GUILE and LilyPond parsers requires that
both can make use of byte offsets for positioning.  GUILE's mandatory
recoding on opening the port does not provide that.

> However, if you insist on doing this, I would
> suggest using a bytevector input port instead, like this: (untested)
>
>   char *buf = c_str ();
>   SCM bv = scm_c_make_bytevector (strlen (buf) + 1);
>   strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf);
>   str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);

dak <at> lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port v2.0.11
dak <at> lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port origin/stable-2.0 
dak <at> lola:/usr/local/tmp/guile$ 

The idea would seem nice, but we are still talking about GUILE 2.0.11
here.  "It is not good" for a facility that, unpretty as it may seem,
was changed _within_ a stable version series without functionally
equivalent replacement is not helpful.

The whole point of a stable release series is to provide dependable
functionality.  Any changes based on the "we don't want people to use
that since it is not nice" rationale should happen between stable
release series.

The way it looks, we'll have to use one mechanism for version 2.0.5 to
2.0.9, have to find out whether to reject 2.0.10, have to reject 2.0.11
and pray for 2.0.12 to provide scm_open_byte_vector_input_port.

And depending on whether the dynamic library versions have been bumped,
we might have to do this at runtime.

-- 
David Kastrup




Information forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Tue, 17 Mar 2015 22:44:02 GMT) Full text and rfc822 format available.

Message #17 received at 20109 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 20109 <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Tue, 17 Mar 2015 18:44:17 -0400
David Kastrup <dak <at> gnu.org> writes:

> Mark H Weaver <mhw <at> netris.org> writes:
>
>> This hack of giving Guile a buffer containing UTF-8, but claiming that
>> it is Latin-1, is not good.  It will cause Guile to see non-ASCII
>> characters as garbage.
>
> For one thing we are talking about an external file here that is mainly
> parsed by LilyPond.  LilyPond provides sensible pinpointing of UTF-8
> encoding errors, something which GUILE cannot do with its UTF-8
> representation since it has no transparent or reproducible
> representation of bad bytes.  Emacs uses overlong encodings for 0-127 to
> represent badly encoded bytes (which includes any overlong sequences) in
> the range 128-255, making 128-255 encode as patterns 0xc0 0x80 to 0xc1
> 0xbf.

I intend to add a similar mechanism to Guile, but it is not yet done.

>> However, if you insist on doing this, I would
>> suggest using a bytevector input port instead, like this: (untested)
>>
>>   char *buf = c_str ();
>>   SCM bv = scm_c_make_bytevector (strlen (buf) + 1);
>>   strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf);
>>   str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);
>
> dak <at> lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port v2.0.11
> dak <at> lola:/usr/local/tmp/guile$ git grep scm_open_byte_vector_input_port origin/stable-2.0 
> dak <at> lola:/usr/local/tmp/guile$ 

You have mispelled the name of the function.  The following (untested)
code should work on Guile 2.0.5 or later:

   char *buf = c_str ();
   size_t len = strlen (buf);
   SCM bv = scm_c_make_bytevector (len);
   memcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf, len);
   str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);

      Mark




Information forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Wed, 18 Mar 2015 12:33:02 GMT) Full text and rfc822 format available.

Message #20 received at 20109 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Mark H Weaver <mhw <at> netris.org>
Cc: 20109 <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Wed, 18 Mar 2015 13:32:55 +0100
Mark H Weaver <mhw <at> netris.org> writes:

> David Kastrup <dak <at> gnu.org> writes:
>
>> Mark H Weaver <mhw <at> netris.org> writes:
>>
>>> This hack of giving Guile a buffer containing UTF-8, but claiming that
>>> it is Latin-1, is not good.  It will cause Guile to see non-ASCII
>>> characters as garbage.
>>
>> For one thing we are talking about an external file here that is
>> mainly parsed by LilyPond.  LilyPond provides sensible pinpointing of
>> UTF-8 encoding errors, something which GUILE cannot do with its UTF-8
>> representation since it has no transparent or reproducible
>> representation of bad bytes.  Emacs uses overlong encodings for 0-127
>> to represent badly encoded bytes (which includes any overlong
>> sequences) in the range 128-255, making 128-255 encode as patterns
>> 0xc0 0x80 to 0xc1 0xbf.
>
> I intend to add a similar mechanism to Guile, but it is not yet done.

I think it would be pretty important since it makes it possible to treat
problems at those points in processing where it makes most sense.

However, it would also seem important to have GUILE handle utf-8
strings.  At the current point of time, its only native types are what
it calls "latin-1" and likely "UTF-32".  Which does not make much sense
in connection with its string ports being unconditionally UTF-8 instead.

Concatenating a string from smaller pieces sequentially via string
operations is O(n^2), so string ports are a natural way to assemble
large strings.  They are also nice for reading from strings.  Not
requiring conversions for most of that would be nice.

>>> However, if you insist on doing this, I would
>>> suggest using a bytevector input port instead, like this: (untested)
>>>
>>>   char *buf = c_str ();
>>>   SCM bv = scm_c_make_bytevector (strlen (buf) + 1);
>>>   strcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf);
>>>   str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);
>>
>> dak <at> lola:/usr/local/tmp/guile$ git grep
>> scm_open_byte_vector_input_port v2.0.11
>> dak <at> lola:/usr/local/tmp/guile$ git grep
>> scm_open_byte_vector_input_port origin/stable-2.0
>> dak <at> lola:/usr/local/tmp/guile$ 
>
> You have mispelled the name of the function.  The following (untested)
> code should work on Guile 2.0.5 or later:
>
>    char *buf = c_str ();
>    size_t len = strlen (buf);
>    SCM bv = scm_c_make_bytevector (len);
>    memcpy (SCM_BYTEVECTOR_CONTENTS (bv), buf, len);
>    str_port_ = scm_open_bytevector_input_port (bv, SCM_UNDEFINED);

One would expect that I'd be able to do a simple copy&paste of a
function name.  Sorry for messing this up.

Yes, this looks like it should indeed provide a better match of
"encoding intentions" to our original code.  I'll have to see whether
I can make this approach work with the rest of our code.

I somehow missed that r6rs ports were more than just a compatibility
wrapper written in Scheme.

-- 
David Kastrup




Information forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Fri, 17 Apr 2015 05:18:02 GMT) Full text and rfc822 format available.

Message #23 received at 20109 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: David Kastrup <dak <at> gnu.org>
Cc: 20109 <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Fri, 17 Apr 2015 01:17:50 -0400
David Kastrup <dak <at> gnu.org> writes:

> In 2.0.9, the following patch/code for getting what amounts to a binary
> string port worked.
>
> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
> Author: David Kastrup <dak <at> gnu.org>
> Date:   Sun Sep 21 18:40:06 2014 +0200
>
>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>
> diff --git a/lily/source-file.cc b/lily/source-file.cc
> index 1118b9d..75ed0d9 100644
> --- a/lily/source-file.cc
> +++ b/lily/source-file.cc
> @@ -152,7 +152,11 @@ Source_file::init_port ()
>    // we do our own utf8 encoding and verification in the parser, so we
>    // use the no-conversion equivalent of latin1
>    SCM str = scm_from_latin1_string (c_str ());
> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
> +  // Why doesn't scm_set_port_encoding_x work here?
> +  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
> +  str_port_ = scm_open_input_string (str);
> +  scm_dynwind_end ();
>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>  }
>  
>
> In 2.0.11, it doesn't.  This is an incompatible API change within the
> "stable" 2.0 series.

Are you sure that you weren't using Guile from our 'master' branch?  I'm
not aware of any change made on our stable-2.0 branch that would break
the above approach.

We _did_ make an incompatible change that would break this approach on
our master branch, which will become Guile 2.2.  On that branch, string
ports always use UTF-8 to encode the initial string, and UTF-8 is always
used as the initial port encoding.  However, stable-2.0 still uses
%default-port-encoding.

      Mark




Information forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Thu, 23 Jun 2016 16:24:02 GMT) Full text and rfc822 format available.

Message #26 received at 20109 <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: Mark H Weaver <mhw <at> netris.org>
Cc: David Kastrup <dak <at> gnu.org>, 20109 <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Thu, 23 Jun 2016 18:23:05 +0200
On Fri 17 Apr 2015 07:17, Mark H Weaver <mhw <at> netris.org> writes:

> David Kastrup <dak <at> gnu.org> writes:
>
>> In 2.0.9, the following patch/code for getting what amounts to a binary
>> string port worked.
>>
>> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
>> Author: David Kastrup <dak <at> gnu.org>
>> Date:   Sun Sep 21 18:40:06 2014 +0200
>>
>>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>>
>> diff --git a/lily/source-file.cc b/lily/source-file.cc
>> index 1118b9d..75ed0d9 100644
>> --- a/lily/source-file.cc
>> +++ b/lily/source-file.cc
>> @@ -152,7 +152,11 @@ Source_file::init_port ()
>>    // we do our own utf8 encoding and verification in the parser, so we
>>    // use the no-conversion equivalent of latin1
>>    SCM str = scm_from_latin1_string (c_str ());
>> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
>> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
>> +  // Why doesn't scm_set_port_encoding_x work here?
>> +  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
>> +  str_port_ = scm_open_input_string (str);
>> +  scm_dynwind_end ();
>>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>>  }
>>  
>>
>> In 2.0.11, it doesn't.  This is an incompatible API change within the
>> "stable" 2.0 series.
>
> Are you sure that you weren't using Guile from our 'master' branch?  I'm
> not aware of any change made on our stable-2.0 branch that would break
> the above approach.
>
> We _did_ make an incompatible change that would break this approach on
> our master branch, which will become Guile 2.2.  On that branch, string
> ports always use UTF-8 to encode the initial string, and UTF-8 is always
> used as the initial port encoding.  However, stable-2.0 still uses
> %default-port-encoding.

I believe Mark is right -- the change to string ports is only on
`master'.  Given that, I think the bug can be closed.  David does this
match your perception?

Andy




Information forwarded to bug-guile <at> gnu.org:
bug#20109; Package guile. (Thu, 23 Jun 2016 16:47:02 GMT) Full text and rfc822 format available.

Message #29 received at 20109 <at> debbugs.gnu.org (full text, mbox):

From: David Kastrup <dak <at> gnu.org>
To: Andy Wingo <wingo <at> pobox.com>
Cc: Mark H Weaver <mhw <at> netris.org>, 20109 <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Thu, 23 Jun 2016 18:46:25 +0200
Andy Wingo <wingo <at> pobox.com> writes:

> On Fri 17 Apr 2015 07:17, Mark H Weaver <mhw <at> netris.org> writes:
>
>> David Kastrup <dak <at> gnu.org> writes:
>>
>>> In 2.0.9, the following patch/code for getting what amounts to a binary
>>> string port worked.
>>>
>>> commit 7f7a124d3470b0d566f796e88f4e2ad5aa043f16
>>> Author: David Kastrup <dak <at> gnu.org>
>>> Date:   Sun Sep 21 18:40:06 2014 +0200
>>>
>>>     Source_file::init_port: Keep GUILEv2 from redecoding string input
>>>
>>> diff --git a/lily/source-file.cc b/lily/source-file.cc
>>> index 1118b9d..75ed0d9 100644
>>> --- a/lily/source-file.cc
>>> +++ b/lily/source-file.cc
>>> @@ -152,7 +152,11 @@ Source_file::init_port ()
>>>    // we do our own utf8 encoding and verification in the parser, so we
>>>    // use the no-conversion equivalent of latin1
>>>    SCM str = scm_from_latin1_string (c_str ());
>>> -  str_port_ = scm_mkstrport (SCM_INUM0, str, SCM_OPN | SCM_RDNG, __FUNCTION__);
>>> +  scm_dynwind_begin ((scm_t_dynwind_flags)0);
>>> +  // Why doesn't scm_set_port_encoding_x work here?
>>> +  scm_dynwind_fluid (ly_lily_module_constant ("%default-port-encoding"), SCM_BOOL_F);
>>> +  str_port_ = scm_open_input_string (str);
>>> +  scm_dynwind_end ();
>>>    scm_set_port_filename_x (str_port_, ly_string2scm (name_));
>>>  }
>>>  
>>>
>>> In 2.0.11, it doesn't.  This is an incompatible API change within the
>>> "stable" 2.0 series.
>>
>> Are you sure that you weren't using Guile from our 'master' branch?  I'm
>> not aware of any change made on our stable-2.0 branch that would break
>> the above approach.
>>
>> We _did_ make an incompatible change that would break this approach on
>> our master branch, which will become Guile 2.2.  On that branch, string
>> ports always use UTF-8 to encode the initial string, and UTF-8 is always
>> used as the initial port encoding.  However, stable-2.0 still uses
>> %default-port-encoding.
>
> I believe Mark is right -- the change to string ports is only on
> `master'.  Given that, I think the bug can be closed.  David does this
> match your perception?

My recollection is that I had a branch working in this area and it
stopped doing so.  I haven't kept written notes, I have not pinpointed a
commit in the respective Guile version range that looks like it could be
responsible.  As this occured in the context of an Ubuntu update,
changes in other libraries (like the locale parts) and/or settings might
have been at play.  I think I downgraded the guile-1.8-dev package (and
dependencies) for a test and was not able to get the stuff working again
either.  I noticed this problem months after the change likely has
happened.

All in all, I cannot provide anything useful for tracking down the
purported regression, nor dependable evidence of it, nor even
circumstancial evidence.  It's purely anecdotal and I have not been able
to recover the purportedly better working state with downgrading.

Whatever may or may not have been involved here, with the fixes to R6RS
binary streams to be released in 2.0.12 we'll have another chance to
steer clear entirely of this area in LilyPond (which has the added
advantage that the changes in 2.1 should no longer affect us).

With regard to Guile 2.0, I cannot provide anything that would warrant
keeping this report open.

-- 
David Kastrup




Reply sent to Andy Wingo <wingo <at> pobox.com>:
You have taken responsibility. (Thu, 23 Jun 2016 17:59:01 GMT) Full text and rfc822 format available.

Notification sent to David Kastrup <dak <at> gnu.org>:
bug acknowledged by developer. (Thu, 23 Jun 2016 17:59:01 GMT) Full text and rfc822 format available.

Message #34 received at 20109-done <at> debbugs.gnu.org (full text, mbox):

From: Andy Wingo <wingo <at> pobox.com>
To: David Kastrup <dak <at> gnu.org>
Cc: Mark H Weaver <mhw <at> netris.org>, 20109-done <at> debbugs.gnu.org
Subject: Re: bug#20109: Incompatible API change in 2.0 series for string port
 encoding
Date: Thu, 23 Jun 2016 19:58:12 +0200
On Thu 23 Jun 2016 18:46, David Kastrup <dak <at> gnu.org> writes:

> With regard to Guile 2.0, I cannot provide anything that would warrant
> keeping this report open.

Okeydoke, will close.  Thanks :)

Andy




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 22 Jul 2016 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 7 years and 278 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.