GNU bug report logs - #13515
24.3.50; file-name operating functions are broken on Japanese Windows

Previous Next

Package: emacs;

Reported by: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>

Date: Mon, 21 Jan 2013 13:51:02 UTC

Severity: normal

Found in version 24.3.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13515 in the body.
You can then email your comments to 13515 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#13515; Package emacs. (Mon, 21 Jan 2013 13:51:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 21 Jan 2013 13:51:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3.50; file-name operating functions are broken on Japanese Windows
Date: Mon, 21 Jan 2013 22:48:44 +0900
The below code returns unexpected result in turnk on Windows.

(let ((file-name-coding-system 'cp932))
  (expand-file-name "表" "C:/"))

-> "c:/\225/"

dostounix_filename does not support cp932 encoded string, which could
contain '\\' as the part of Kankji characters.  By the fix for
Bug#12933, dostounix_filename could receive such string.  In
addition, that change also let the below code fail.

(let ((file-name-coding-system 'cp1252))
  (expand-file-name "漢字" "C:/"))

-> "c:/  "

-- 
Kazuhiro Ito




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13515; Package emacs. (Tue, 22 Jan 2013 12:15:02 GMT) Full text and rfc822 format available.

Message #8 received at 13515 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
Cc: 13515 <at> debbugs.gnu.org
Subject: Re: bug#13515: 24.3.50;
	file-name operating functions are broken on Japanese Windows
Date: Tue, 22 Jan 2013 14:13:32 +0200
> Date: Mon, 21 Jan 2013 22:48:44 +0900
> From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> 
> The below code returns unexpected result in turnk on Windows.
> 
> (let ((file-name-coding-system 'cp932))
>   (expand-file-name "表" "C:/"))
> 
> -> "c:/\225/"
> 
> dostounix_filename does not support cp932 encoded string, which could
> contain '\\' as the part of Kankji characters.

Thanks, I will work on fixing this.

> By the fix for Bug#12933, dostounix_filename could receive such
> string.

Before that fix, dostounix_filename would indeed accept such file
names, but what it did with them was exhibiting undefined behavior,
because it treated multibyte strings in Emacs internal representation
as if they were simple unibyte strings.

> In addition, that change also let the below code fail.
> 
> (let ((file-name-coding-system 'cp1252))
>   (expand-file-name "漢字" "C:/"))
> 
> -> "c:/  "

IMO, this snippet doesn't make sense and cannot be supported.
expand-file-name calls a number of system APIs which need the file
name be encoded, so using file-name-coding-system that cannot possibly
encode a file name is not supposed to work.

Do you have a real-life situation where such cases emerge and need to
be supported?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13515; Package emacs. (Tue, 22 Jan 2013 13:30:02 GMT) Full text and rfc822 format available.

Message #11 received at 13515 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: kzhr <at> d1.dion.ne.jp
Cc: 13515 <at> debbugs.gnu.org
Subject: Re: bug#13515: 24.3.50;
	file-name operating functions are broken on Japanese Windows
Date: Tue, 22 Jan 2013 15:27:44 +0200
> Date: Tue, 22 Jan 2013 14:13:32 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 13515 <at> debbugs.gnu.org
> 
> > Date: Mon, 21 Jan 2013 22:48:44 +0900
> > From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> > 
> > The below code returns unexpected result in turnk on Windows.
> > 
> > (let ((file-name-coding-system 'cp932))
> >   (expand-file-name "表" "C:/"))
> > 
> > -> "c:/\225/"
> > 
> > dostounix_filename does not support cp932 encoded string, which could
> > contain '\\' as the part of Kankji characters.
> 
> Thanks, I will work on fixing this.

Please try the changes below (relative to the emacs-24 branch).  If no
issues are found with them, I will soon install them on the release
branch.

=== modified file 'src/w32.c'
--- src/w32.c	2013-01-01 09:11:05 +0000
+++ src/w32.c	2013-01-22 13:20:33 +0000
@@ -37,7 +37,7 @@ along with GNU Emacs.  If not, see <http
 /* must include CRT headers *before* config.h */
 
 #include <config.h>
-#include <mbstring.h>	/* for _mbspbrk */
+#include <mbstring.h>	/* for _mbspbrk and _mbslwr */
 
 #undef access
 #undef chdir
@@ -1304,6 +1304,67 @@ srandom (int seed)
   srand (seed);
 }
 
+/* Current codepage for encoding file names.  */
+static int file_name_codepage;
+
+/* Return the maximum length in bytes of a multibyte character
+   sequence encoded in the current ANSI codepage.  This is required to
+   correctly walk the encoded file names one character at a time.  */
+static int
+max_filename_mbslen (void)
+{
+  /* A simple cache to avoid calling GetCPInfo every time we need to
+     normalize a file name.  The file-name encoding is not supposed to
+     be changed too frequently, if ever.  */
+  static Lisp_Object last_file_name_encoding;
+  static int last_max_mbslen;
+  Lisp_Object current_encoding;
+
+  current_encoding = Vfile_name_coding_system;
+  if (NILP (current_encoding))
+    current_encoding = Vdefault_file_name_coding_system;
+
+  if (!EQ (last_file_name_encoding, current_encoding))
+    {
+      CPINFO cp_info;
+
+      last_file_name_encoding = current_encoding;
+      /* Default to the current ANSI codepage.  */
+      file_name_codepage = w32_ansi_code_page;
+      if (!NILP (current_encoding))
+	{
+	  char *cpname = SDATA (SYMBOL_NAME (current_encoding));
+	  char *cp = NULL, *end;
+	  int cpnum;
+
+	  if (strncmp (cpname, "cp", 2) == 0)
+	    cp = cpname + 2;
+	  else if (strncmp (cpname, "windows-", 8) == 0)
+	    cp = cpname + 8;
+
+	  if (cp)
+	    {
+	      end = cp;
+	      cpnum = strtol (cp, &end, 10);
+	      if (cpnum && *end == '\0' && end - cp >= 2)
+		file_name_codepage = cpnum;
+	    }
+	}
+
+      if (!file_name_codepage)
+	file_name_codepage = CP_ACP; /* CP_ACP = 0, but let's not assume that */
+
+      if (!GetCPInfo (file_name_codepage, &cp_info))
+	{
+	  file_name_codepage = CP_ACP;
+	  if (!GetCPInfo (file_name_codepage, &cp_info))
+	    emacs_abort ();
+	}
+      last_max_mbslen = cp_info.MaxCharSize;
+    }
+
+  return last_max_mbslen;
+}
 
 /* Normalize filename by converting all path separators to
    the specified separator.  Also conditionally convert upper
@@ -1313,14 +1374,20 @@ static void
 normalize_filename (register char *fp, char path_sep)
 {
   char sep;
-  char *elem;
+  char *elem, *p2;
+  int dbcs_p = max_filename_mbslen () > 1;
 
   /* Always lower-case drive letters a-z, even if the filesystem
      preserves case in filenames.
      This is so filenames can be compared by string comparison
      functions that are case-sensitive.  Even case-preserving filesystems
      do not distinguish case in drive letters.  */
-  if (fp[1] == ':' && *fp >= 'A' && *fp <= 'Z')
+  if (dbcs_p)
+    p2 = CharNextExA (file_name_codepage, fp, 0);
+  else
+    p2 = fp + 1;
+
+  if (*p2 == ':' && *fp >= 'A' && *fp <= 'Z')
     {
       *fp += 'a' - 'A';
       fp += 2;
@@ -1332,7 +1399,10 @@ normalize_filename (register char *fp, c
 	{
 	  if (*fp == '/' || *fp == '\\')
 	    *fp = path_sep;
-	  fp++;
+	  if (!dbcs_p)
+	    fp++;
+	  else
+	    fp = CharNextExA (file_name_codepage, fp, 0);
 	}
       return;
     }
@@ -1355,13 +1425,20 @@ normalize_filename (register char *fp, c
 	if (elem && elem != fp)
 	  {
 	    *fp = 0;		/* temporary end of string */
-	    _strlwr (elem);	/* while we convert to lower case */
+	    _mbslwr (elem);	/* while we convert to lower case */
 	  }
 	*fp = sep;		/* convert (or restore) path separator */
 	elem = fp + 1;		/* next element starts after separator */
 	sep = path_sep;
       }
-  } while (*fp++);
+    if (*fp)
+      {
+	if (!dbcs_p)
+	  fp++;
+	else
+	  fp = CharNextExA (file_name_codepage, fp, 0);
+      }
+  } while (*fp);
 }
 
 /* Destructively turn backslashes into slashes.  */
@@ -2588,15 +2665,22 @@ readdir (DIR *dirp)
     strcpy (dir_static.d_name, dir_find_data.cFileName);
   dir_static.d_namlen = strlen (dir_static.d_name);
   if (dir_is_fat)
-    _strlwr (dir_static.d_name);
+    _mbslwr (dir_static.d_name);
   else if (downcase)
     {
       register char *p;
-      for (p = dir_static.d_name; *p; p++)
-	if (*p >= 'a' && *p <= 'z')
-	  break;
+      int dbcs_p = max_filename_mbslen () > 1;
+      for (p = dir_static.d_name; *p; )
+	{
+	  if (*p >= 'a' && *p <= 'z')
+	    break;
+	  if (dbcs_p)
+	    p = CharNextExA (file_name_codepage, p, 0);
+	  else
+	    p++;
+	}
       if (!*p)
-	_strlwr (dir_static.d_name);
+	_mbslwr (dir_static.d_name);
     }
 
   return &dir_static;





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13515; Package emacs. (Wed, 23 Jan 2013 09:41:03 GMT) Full text and rfc822 format available.

Message #14 received at 13515 <at> debbugs.gnu.org (full text, mbox):

From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13515 <at> debbugs.gnu.org
Subject: Re: bug#13515: 24.3.50;
	file-name operating functions are broken on Japanese Windows
Date: Wed, 23 Jan 2013 18:38:23 +0900
> > By the fix for Bug#12933, dostounix_filename could receive such
> > string.
> 
> Before that fix, dostounix_filename would indeed accept such file
> names, but what it did with them was exhibiting undefined behavior,
> because it treated multibyte strings in Emacs internal representation
> as if they were simple unibyte strings.

Agreed.  On Japanese Windows, Emacs had been able to treat file name
strings correctly in many years accidentally.

 > > In addition, that change also let the below code fail.
 > > 
 > > (let ((file-name-coding-system 'cp1252))
 > >   (expand-file-name "漢字" "C:/"))
 > > 
 > > -> "c:/  "
 > 
 > IMO, this snippet doesn't make sense and cannot be supported.
 > expand-file-name calls a number of system APIs which need the file
 > name be encoded, so using file-name-coding-system that cannot possibly
 > encode a file name is not supposed to work.
 > 
 > Do you have a real-life situation where such cases emerge and need to
 > be supported?

 None for me, sorry for inappropriate example.  But the docstring of
 w32-downcase-file-names says it affects remote file names and the fix
 for Bug#12933 also affects other functions without using system APIs
 (e.g., file-name-directory).  I guess it would be better that these
 functions (except ones using system APIs) didn't depend on codepage.
 Does Emacs neither support the below code?

 (let ((file-name-coding-system 'cp1252))
   (file-name-directory "漢字/"))

-> "  /"

-- 
Kazuhiro Ito




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13515; Package emacs. (Wed, 23 Jan 2013 09:42:01 GMT) Full text and rfc822 format available.

Message #17 received at 13515 <at> debbugs.gnu.org (full text, mbox):

From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13515 <at> debbugs.gnu.org
Subject: Re: bug#13515: 24.3.50;
	file-name operating functions are broken on Japanese Windows
Date: Wed, 23 Jan 2013 18:39:04 +0900
At Tue, 22 Jan 2013 15:27:44 +0200,
Eli Zaretskii wrote:
> 
> > Date: Tue, 22 Jan 2013 14:13:32 +0200
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: 13515 <at> debbugs.gnu.org
> > 
> > > Date: Mon, 21 Jan 2013 22:48:44 +0900
> > > From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> > > 
> > > The below code returns unexpected result in turnk on Windows.
> > > 
> > > (let ((file-name-coding-system 'cp932))
> > >   (expand-file-name "表" "C:/"))
> > > 
> > > -> "c:/\225/"
> > > 
> > > dostounix_filename does not support cp932 encoded string, which could
> > > contain '\\' as the part of Kankji characters.
> > 
> > Thanks, I will work on fixing this.
> 
> Please try the changes below (relative to the emacs-24 branch).  If no
> issues are found with them, I will soon install them on the release
> branch.

As far as I tested, the problem was fixed.  Thank you.

-- 
Kazuhiro Ito




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13515; Package emacs. (Wed, 23 Jan 2013 16:15:01 GMT) Full text and rfc822 format available.

Message #20 received at 13515 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
Cc: 13515 <at> debbugs.gnu.org
Subject: Re: bug#13515: 24.3.50;
	file-name operating functions are broken on Japanese Windows
Date: Wed, 23 Jan 2013 18:13:04 +0200
> Date: Wed, 23 Jan 2013 18:39:04 +0900
> From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> Cc: 13515 <at> debbugs.gnu.org
> 
> At Tue, 22 Jan 2013 15:27:44 +0200,
> Eli Zaretskii wrote:
> > 
> > > Date: Tue, 22 Jan 2013 14:13:32 +0200
> > > From: Eli Zaretskii <eliz <at> gnu.org>
> > > Cc: 13515 <at> debbugs.gnu.org
> > > 
> > > > Date: Mon, 21 Jan 2013 22:48:44 +0900
> > > > From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> > > > 
> > > > The below code returns unexpected result in turnk on Windows.
> > > > 
> > > > (let ((file-name-coding-system 'cp932))
> > > >   (expand-file-name "表" "C:/"))
> > > > 
> > > > -> "c:/\225/"
> > > > 
> > > > dostounix_filename does not support cp932 encoded string, which could
> > > > contain '\\' as the part of Kankji characters.
> > > 
> > > Thanks, I will work on fixing this.
> > 
> > Please try the changes below (relative to the emacs-24 branch).  If no
> > issues are found with them, I will soon install them on the release
> > branch.
> 
> As far as I tested, the problem was fixed.  Thank you.

Thanks, installed as revision 111194 on the emacs-24 branch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13515; Package emacs. (Wed, 23 Jan 2013 16:23:01 GMT) Full text and rfc822 format available.

Message #23 received at 13515 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
Cc: 13515 <at> debbugs.gnu.org
Subject: Re: bug#13515: 24.3.50;
	file-name operating functions are broken on Japanese Windows
Date: Wed, 23 Jan 2013 18:21:16 +0200
> Date: Wed, 23 Jan 2013 18:38:23 +0900
> From: Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>
> Cc: 13515 <at> debbugs.gnu.org
> 
>  > > In addition, that change also let the below code fail.
>  > > 
>  > > (let ((file-name-coding-system 'cp1252))
>  > >   (expand-file-name "漢字" "C:/"))
>  > > 
>  > > -> "c:/  "
>  > 
>  > IMO, this snippet doesn't make sense and cannot be supported.
>  > expand-file-name calls a number of system APIs which need the file
>  > name be encoded, so using file-name-coding-system that cannot possibly
>  > encode a file name is not supposed to work.
>  > 
>  > Do you have a real-life situation where such cases emerge and need to
>  > be supported?
> 
>  None for me, sorry for inappropriate example.  But the docstring of
>  w32-downcase-file-names says it affects remote file names and the fix
>  for Bug#12933 also affects other functions without using system APIs
>  (e.g., file-name-directory).

On Windows, file-name-directory and similar functions do call system
APIs, for 2 reasons: (1) down-casing file names under
w32-downcase-file-names, and (2) advancing by characters in DBCS
locales, which can only be supported if file-name-coding-system is one
of the codepages known to Windows.

> I guess it would be better that these functions (except ones using
>  system APIs) didn't depend on codepage.  Does Emacs neither support
>  the below code?
> 
>  (let ((file-name-coding-system 'cp1252))
>    (file-name-directory "漢字/"))
> 
> -> "  /"

Well, "漢字/" is not a remote file name, so it is still subject to the
limitation that only file names that can be encoded by the
file-name-coding-system are supported.  But even using a remote file
name, such as "/foo <at> bar.com:漢字/", gets butchered by
file-name-directory.  However, this is a much broader issue, related
to Tramp and to other non-Windows specific aspects of file-name
handling, so I will start a discussion about that on emacs-devel.
Thanks for bringing up this point.




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Wed, 23 Jan 2013 16:56:02 GMT) Full text and rfc822 format available.

Notification sent to Kazuhiro Ito <kzhr <at> d1.dion.ne.jp>:
bug acknowledged by developer. (Wed, 23 Jan 2013 16:56:02 GMT) Full text and rfc822 format available.

Message #28 received at 13515-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: kzhr <at> d1.dion.ne.jp
Cc: 13515-done <at> debbugs.gnu.org
Subject: Re: bug#13515: 24.3.50;
	file-name operating functions are broken on Japanese Windows
Date: Wed, 23 Jan 2013 18:54:32 +0200
> Date: Wed, 23 Jan 2013 18:13:04 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 13515 <at> debbugs.gnu.org
> 
> > > Please try the changes below (relative to the emacs-24 branch).  If no
> > > issues are found with them, I will soon install them on the release
> > > branch.
> > 
> > As far as I tested, the problem was fixed.  Thank you.
> 
> Thanks, installed as revision 111194 on the emacs-24 branch.

Closing the bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 21 Feb 2013 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 69 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.