GNU bug report logs -
#38235
string-foldcase bug for trailing sigma
Previous Next
To reply to this bug, email your comments to 38235 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guile <at> gnu.org
:
bug#38235
; Package
guile
.
(Sat, 16 Nov 2019 20:42:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Andy Wingo <wingo <at> pobox.com>
:
New bug report received and forwarded. Copy sent to
bug-guile <at> gnu.org
.
(Sat, 16 Nov 2019 20:42:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Given the following example, using (rnrs unicode):
(string-foldcase "ΜΈΛΟΣ")
The expected result is "μέλοσ"; see R6RS libraries section 1.2. However
instead Guile's result is "μέλος". Note that although Σ usually
downcases to σ, at the end of a string it's ς. This test shows a
limitation of defining string-foldcase as simply (string-downcase
(string-upcase str)).
Information forwarded
to
bug-guile <at> gnu.org
:
bug#38235
; Package
guile
.
(Sun, 17 Nov 2019 11:20:02 GMT)
Full text and
rfc822 format available.
Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sat, Nov 16, 2019 at 09:41:05PM +0100, Andy Wingo wrote:
> Given the following example, using (rnrs unicode):
>
> (string-foldcase "ΜΈΛΟΣ")
Good catch. I think there's even a worse example: dotless
and dotted I [1]. Here it seems even impossible to do
up- and downcase correctly without knowing the language
context.
Cheers
[1] https://en.wikipedia.org/wiki/%C4%B0
-- tomás
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-guile <at> gnu.org
:
bug#38235
; Package
guile
.
(Sun, 17 Nov 2019 18:14:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 38235 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Sat, Nov 16, 2019 at 3:42 PM Andy Wingo <wingo <at> pobox.com> wrote:
> The expected result is "μέλοσ"; see R6RS libraries section 1.2. However
> instead Guile's result is "μέλος". Note that although Σ usually
> downcases to σ, at the end of a string it's ς.
More precisely, it downcases to σ if a letter follows and to ς if not
(being at the end of a string is a particular case). However, this is not
actually always Greekly correct: the string "ΦΙΛΟΣ." with a period at the
end downcases to "φιλος." if it is the word φίλος 'friend' (without its
proper accent) at the end of a sentence, but as "φιλος." if it is an
abbreviation for φιλοσοφία 'philosophy'. For this reason, R7RS does not
require mapping to ς in this situation as R6RS does.
This test shows a
> limitation of defining string-foldcase as simply (string-downcase
> (string-upcase str)).
>
As explained in Unicode section 5.18, the foldcase mappings (in <
https://www.unicode.org/Public/UNIDATA/CaseFolding.txt>, the lines with
status C and F) actually create a set of equivalence classes that are
closed under {upper,lower,title}case mapping, and then choose a single
character to represent each class. This is usually the unique lowercase
character, but not always: in Cherokee it is the uppercase character, and
in the set {Σ, σ, ς} it is σ.
On Sun, Nov 17, 2019 at 6:20 AM <tomas <at> tuxteam.de> wrote:
Good catch. I think there's even a worse example: dotless
> and dotted I [1]. Here it seems even impossible to do
> up- and downcase correctly without knowing the language
> context.
>
Language-specific case mappings are explicitly out of Scheme's remit: they
have to be performed by specialized libraries. There is an additional
situation in Lithuanian dictionaries (but not running text): an "i" with a
tone accent is represented as "i" + dot above + accent, like this: "i̇́".
However, this dot above must be dropped when uppercasing, producing
ordinary "Í".
[Message part 2 (text/html, inline)]
This bug report was last modified 4 years and 160 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.