GNU bug report logs - #33729
27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)

Previous Next

Package: emacs;

Reported by: Kaushal Modi <kaushal.modi <at> gmail.com>

Date: Thu, 13 Dec 2018 20:22:02 UTC

Severity: normal

Found in version 27.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33729 in the body.
You can then email your comments to 33729 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Thu, 13 Dec 2018 20:22:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Kaushal Modi <kaushal.modi <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 13 Dec 2018 20:22:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Kaushal Modi <kaushal.modi <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, far.nasiri.m <at> gmail.com
Subject: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz
 enabled (renders fine using m17n)
Date: Thu, 13 Dec 2018 15:20:23 -0500
[Message part 1 (text/plain, inline)]
 Hello,

I built emacs from harfbuzz branch with harfbuzz 1.0.3 installed (RHEL 6.8).

I quickly compared Hindi and Gujarati rendering difference between emacs
built with m17n vs the new harfbuzz branch build.

With harfbuzz, it does not render the partial glyphs for Gujarati, but does
it fine for Hindi. But on the build with m17n, both Hindi and Gujarati show
that partial glyph rendered fine.

Screenshot to explain this issue: https://i.imgtc.com/md9Yz7X.png



In GNU Emacs 27.0.50 (build 2, x86_64-pc-linux-gnu, GTK+ Version 2.24.23)
 of 2018-12-13
Repository revision: 981b3d292aff49452c2b5f0217b57ec1a2829a8b
Repository branch: harfbuzz
Windowing system distributor 'The X.Org Foundation', version 11.0.60900000
System Description: Red Hat Enterprise Linux Workstation release 6.8
(Santiago)

Recent messages:
Emacs version: GNU Emacs 27.0.50 (build 2, x86_64-pc-linux-gnu, GTK+
Version 2.24.23)
 of 2018-12-13, built using commit 981b3d292aff49452c2b5f0217b57ec1a2829a8b.

./configure options:
  --with-modules --prefix=/home/kmodi/usr_local/apps/6/emacs/harfbuzz
'--program-transform-name=s/^ctags$/ctags_emacs/' --with-harfbuzz
'CPPFLAGS=-I/home/kmodi/stowed/include -I/home/kmodi/usr_local/6/include
-I/usr/include/freetype2 -I/usr/include' 'CFLAGS=-O2 -march=native'
'LDFLAGS=-L/home/kmodi/stowed/lib -L/home/kmodi/stowed/lib64
-L/home/kmodi/usr_local/6/lib -L/home/kmodi/usr_local/6/lib64'
PKG_CONFIG_PATH=/home/kmodi/usr_local/6/lib/pkgconfig:/home/kmodi/usr_local/6/lib64/pkgconfig:/cad/adi/apps/gnu/linux/x86_64/6/lib/pkgconfig:/cad/adi/apps/gnu/linux/x86_64/6/lib64/pkgconfig:/home/kmodi/stowed/lib/pkgconfig:/usr/lib/pkgconfig:/usr/lib64/pkgconfig:/usr/share/pkgconfig:/lib/pkgconfig:/lib64/pkgconfig

Features:
  XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS GLIB
NOTIFY INOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE HARFBUZZ M17N_FLT
LIBOTF XFT ZLIB TOOLKIT_SCROLL_BARS GTK2 X11 XDBE XIM MODULES THREADS GMP

Important settings:
  value of $LANG: en_US.UTF-8
  value of $XMODIFIERS: @im=none
  locale-coding-system: utf-8-unix


--
Kaushal Modi
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Thu, 13 Dec 2018 20:27:01 GMT) Full text and rfc822 format available.

Message #8 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Kaushal Modi <kaushal.modi <at> gmail.com>
To: 33729 <at> debbugs.gnu.org
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, far.nasiri.m <at> gmail.com
Subject: Re: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz
 enabled (renders fine using m17n)
Date: Thu, 13 Dec 2018 15:25:16 -0500
[Message part 1 (text/plain, inline)]
>
> Screenshot to explain this issue: https://i.imgtc.com/md9Yz7X.png
>

I don't know Arabic. But from that same screenshot, it's evident that the
rendering of that same text is quite different between m17n and harfbuzz.
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Thu, 13 Dec 2018 20:34:01 GMT) Full text and rfc822 format available.

Message #11 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Kaushal Modi <kaushal.modi <at> gmail.com>
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
Subject: Re: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz
 enabled (renders fine using m17n)
Date: Thu, 13 Dec 2018 22:31:02 +0200
On Thu, Dec 13, 2018 at 03:25:16PM -0500, Kaushal Modi wrote:
> >
> > Screenshot to explain this issue: https://i.imgtc.com/md9Yz7X.png
> >
> 
> I don't know Arabic. But from that same screenshot, it's evident that the
> rendering of that same text is quite different between m17n and harfbuzz.

The HarfBuzz rendering of Arabic is the correct one in this screenshot.
For debugging the such rendering differences, the actual font used by
Emacs for a given part of the text need to be known, then the text and
the font can be checked against vanilla HarfBuzz (e.g. using the hb-view
command line tool); if it gives the same rendering then it is either a
HarfBuzz or font issue, if not then it is a bug in the HarfBuzz
integration code in Emacs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Thu, 13 Dec 2018 20:45:02 GMT) Full text and rfc822 format available.

Message #14 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Kaushal Modi <kaushal.modi <at> gmail.com>
To: dr.khaled.hosny <at> gmail.com
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
Subject: Re: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz
 enabled (renders fine using m17n)
Date: Thu, 13 Dec 2018 15:43:50 -0500
[Message part 1 (text/plain, inline)]
On Thu, Dec 13, 2018 at 3:31 PM Khaled Hosny <dr.khaled.hosny <at> gmail.com>
wrote:

>
> The HarfBuzz rendering of Arabic is the correct one in this screenshot.
>

Thanks. So here's the status so far:

Rendering of Namaste as seen in C-h h (M-x view-hello-file):

|          | harfbuzz | m17b    |
|----------+----------+---------|
| Hindi    | correct  | correct |
| Gujarati | wrong    | correct |
| Arabic   | correct  | wrong   |



> For debugging the such rendering differences, the actual font used by
> Emacs for a given part of the text need to be known,


I am using Mukta Vaani font for Gujarati. It is a free font and be
downloaded from https://ektype.in/mukta-vaani.html.

The string being rendered is "નમસ્તે".
By placing the cursor on each of those characters and doing C-u x = (on the
m17n build), I get:

(1) ન

             position: 1610 of 3509 (46%), column: 32
            character: ન (displayed as ન) (codepoint 2728, #o5250, #xaa8)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x3968
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aa8" or "C-x 8 RET GUJARATI LETTER
NA"
          buffer code: #xE0 #xAA #xA8
            file code: #xE0 #xAA #xA8 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
(#x234)

Character code properties: customize what to show
  name: GUJARATI LETTER NA
  general-category: Lo (Letter, Other)
  decomposition: (2728) ('ન')

There are text properties here:
  charset              mule-unicode-0100-24ff

(2) મ

             position: 1611 of 3509 (46%), column: 33
            character: મ (displayed as મ) (codepoint 2734, #o5256, #xaae)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x396E
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aae" or "C-x 8 RET GUJARATI LETTER
MA"
          buffer code: #xE0 #xAA #xAE
            file code: #xE0 #xAA #xAE (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
(#x239)

Character code properties: customize what to show
  name: GUJARATI LETTER MA
  general-category: Lo (Letter, Other)
  decomposition: (2734) ('મ')

There are text properties here:
  charset              mule-unicode-0100-24ff

(3) સ્તે

             position: 1612 of 3509 (46%), column: 34
            character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x3978
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER
SA"
          buffer code: #xE0 #xAA #xB8
            file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix)
              display: composed to form "સ્તે" (see below)

Composed with the following character(s) "્તે" using this font:
  xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
by these glyphs:
  [0 3 0 645 8 0 11 11 0 [0 0 8]]
  [0 3 2724 560 11 1 11 11 1 nil]
  [0 3 2759 589 0 -9 -2 16 -11 [-1 0 0]]

Character code properties: customize what to show
  name: GUJARATI LETTER SA
  general-category: Lo (Letter, Other)
  decomposition: (2744) ('સ')

There are text properties here:
  charset              mule-unicode-0100-24ff


=====


On harfbuzz build, the "સ્તે" part is different.. I can place the cursor
separately on સ્ and તે, do C-u x = and I get:

(3.1) સ્
             position: 1612 of 3509 (46%), column: 34
            character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x3978
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER
SA"
          buffer code: #xE0 #xAA #xB8
            file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
(#x241)

Character code properties: customize what to show
  name: GUJARATI LETTER SA
  general-category: Lo (Letter, Other)
  decomposition: (2744) ('સ')

There are text properties here:
  charset              mule-unicode-0100-24ff

(3.2) તે

             position: 1614 of 3509 (46%), column: 35
            character: ત (displayed as ત) (codepoint 2724, #o5244, #xaa4)
              charset: mule-unicode-0100-24ff (Unicode characters of the
range U+0100..U+24FF.)
code point in charset: 0x3964
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aa4" or "C-x 8 RET GUJARATI LETTER
TA"
          buffer code: #xE0 #xAA #xA4
            file code: #xE0 #xAA #xA4 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
(#x230)

Character code properties: customize what to show
  name: GUJARATI LETTER TA
  general-category: Lo (Letter, Other)
  decomposition: (2724) ('ત')

There are text properties here:
  charset              mule-unicode-0100-24ff



then the text and
> the font can be checked against vanilla HarfBuzz (e.g. using the hb-view
> command line tool); if it gives the same rendering then it is either a
> HarfBuzz or font issue, if not then it is a bug in the HarfBuzz
> integration code in Emacs.
>
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Thu, 13 Dec 2018 20:54:02 GMT) Full text and rfc822 format available.

Message #17 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Kaushal Modi <kaushal.modi <at> gmail.com>
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
Subject: Re: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz
 enabled (renders fine using m17n)
Date: Thu, 13 Dec 2018 22:53:36 +0200
On Thu, Dec 13, 2018 at 03:43:50PM -0500, Kaushal Modi wrote:
> On Thu, Dec 13, 2018 at 3:31 PM Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> wrote:
> 
> >
> > The HarfBuzz rendering of Arabic is the correct one in this screenshot.
> >
> 
> Thanks. So here's the status so far:
> 
> Rendering of Namaste as seen in C-h h (M-x view-hello-file):
> 
> |          | harfbuzz | m17b    |
> |----------+----------+---------|
> | Hindi    | correct  | correct |
> | Gujarati | wrong    | correct |
> | Arabic   | correct  | wrong   |
> 
> 
> 
> > For debugging the such rendering differences, the actual font used by
> > Emacs for a given part of the text need to be known,
> 
> 
> I am using Mukta Vaani font for Gujarati. It is a free font and be
> downloaded from https://ektype.in/mukta-vaani.html.
> 
> The string being rendered is "નમસ્તે".

I tried that font and text with hb-view and the output I get is
identical to m17b. If I pass a wrong script to HarfBuzz (e.g.
--script=latn), I get the same broken output you see in Emacs. So I’m
guessing something is not correctly working in script itemization. Most
likely the FIXME in uni_script(), or the FIXME above the call to
hb_buffer_guess_segment_properties().

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Thu, 13 Dec 2018 21:06:01 GMT) Full text and rfc822 format available.

Message #20 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Kaushal Modi <kaushal.modi <at> gmail.com>
To: dr.khaled.hosny <at> gmail.com
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
Subject: Re: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz
 enabled (renders fine using m17n)
Date: Thu, 13 Dec 2018 16:04:25 -0500
[Message part 1 (text/plain, inline)]
On Thu, Dec 13, 2018 at 3:53 PM Khaled Hosny <dr.khaled.hosny <at> gmail.com>
wrote:

>
> I tried that font and text with hb-view and the output I get is
> identical to m17b.


hb-view is nifty! I wasn't sure if it would work for me (because I haven't
set my terminal to show unicode, etc.). But even with the older Harfbuzz
1.0.3 that I have, hb-view gave this: https://i.imgtc.com/d1N177Z.png

I am impressed. That shows the correct rendering of નમસ્તે. (I just blindly
pasted  નમસ્તે as the second argument and hit enter, my terminal doesn't
even show the pasted text. But the hb-view rendering is correct.)


> If I pass a wrong script to HarfBuzz (e.g.
> --script=latn), I get the same broken output you see in Emacs. So I’m
> guessing something is not correctly working in script itemization. Most
> likely the FIXME in uni_script(), or the FIXME above the call to
> hb_buffer_guess_segment_properties().
>

I am not a C developer. But hopefully this information would help you to
fix the Harfbuzz integration with Emacs.

I am surprised that the rendering of Hindi नमस्ते using Harfbuzz in Emacs
is correct, while the  rendering of Gujarati નમસ્તે is not, when in fact
the two scripts are so similar to each other. [Fun fact: Most of Gujarati
script if superimposed with a line at the top will look like valid Hindi.
You can see that in the case of  નમસ્તે vs  नमस्ते :) ]
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 05:59:01 GMT) Full text and rfc822 format available.

Message #23 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Kaushal Modi <kaushal.modi <at> gmail.com>
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org,
 far.nasiri.m <at> gmail.com
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Fri, 14 Dec 2018 07:57:55 +0200
> From: Kaushal Modi <kaushal.modi <at> gmail.com>
> Date: Thu, 13 Dec 2018 15:43:50 -0500
> Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
> 
>  For debugging the such rendering differences, the actual font used by
>  Emacs for a given part of the text need to be known,
> 
> I am using Mukta Vaani font for Gujarati. It is a free font and be downloaded from
> https://ektype.in/mukta-vaani.html.

Your data indicates that the m17n build performs character composition
at buffer position 34, whereas the harfbuzz build does not.  The
question is why.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 06:47:02 GMT) Full text and rfc822 format available.

Message #26 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 33729 <at> debbugs.gnu.org
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, Florian Beck <fb <at> fbeck.net>,
 far.nasiri.m <at> gmail.com, Kaushal Modi <kaushal.modi <at> gmail.com>
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with
 Harfbuzz enabled (renders fine using m17n)
Date: Thu, 13 Dec 2018 22:45:48 -0800
Florian Beck pointed out some examples of possible related problems when 
rendering Emacs's etc/HELLO file; see:

https://lists.gnu.org/r/emacs-devel/2018-12/msg00271.html

For the names of the languages in the languages, Harfbuzz seems to be better for 
Burmese (မြန်မာ) (where master is wrong); conversely Harfbuzz seems to be wrong 
for Maldivian (ދިވެހި) (where master is better). Please see the following for 
what these should look like:

https://en.wikipedia.org/wiki/File:Dhivehiscript.svg

https://en.wikipedia.org/wiki/File:Burmese_script_sample.svg




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 07:50:01 GMT) Full text and rfc822 format available.

Message #29 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: kaushal.modi <at> gmail.com
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org,
 far.nasiri.m <at> gmail.com
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Fri, 14 Dec 2018 09:48:54 +0200
> Date: Fri, 14 Dec 2018 07:57:55 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org,
> 	far.nasiri.m <at> gmail.com
> 
> Your data indicates that the m17n build performs character composition
> at buffer position 34

Sorry, wrong number: I meant buffer position 1612.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 07:52:01 GMT) Full text and rfc822 format available.

Message #32 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com,
 Kaushal Modi <kaushal.modi <at> gmail.com>
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Fri, 14 Dec 2018 09:50:56 +0200
On Fri, Dec 14, 2018 at 07:57:55AM +0200, Eli Zaretskii wrote:
> > From: Kaushal Modi <kaushal.modi <at> gmail.com>
> > Date: Thu, 13 Dec 2018 15:43:50 -0500
> > Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
> > 
> >  For debugging the such rendering differences, the actual font used by
> >  Emacs for a given part of the text need to be known,
> > 
> > I am using Mukta Vaani font for Gujarati. It is a free font and be downloaded from
> > https://ektype.in/mukta-vaani.html.
> 
> Your data indicates that the m17n build performs character composition
> at buffer position 34, whereas the harfbuzz build does not.  The
> question is why.

See my earlier email, most likely the culprit is the broken Emacs to
HarfBuzz script code mapping that we discussed earlier. HarfBuzz needs
to know the correct script of the text to perform shaping, and it looks
like we are passing nonsense values for certain scripts (or rather for
certain scripts we are lucky that the mapping is not broken).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 10:04:02 GMT) Full text and rfc822 format available.

Message #35 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com,
 kaushal.modi <at> gmail.com
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Fri, 14 Dec 2018 12:03:32 +0200
> Date: Fri, 14 Dec 2018 09:50:56 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: Kaushal Modi <kaushal.modi <at> gmail.com>, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
> 
> > Your data indicates that the m17n build performs character composition
> > at buffer position 34, whereas the harfbuzz build does not.  The
> > question is why.
> 
> See my earlier email, most likely the culprit is the broken Emacs to
> HarfBuzz script code mapping that we discussed earlier. HarfBuzz needs
> to know the correct script of the text to perform shaping, and it looks
> like we are passing nonsense values for certain scripts (or rather for
> certain scripts we are lucky that the mapping is not broken).

Thanks.

I don't yet have access to a GNU/Linux system with HarfBuzz installed,
so I cannot myself debug it.

I hope Mohammad will be able to look into this and either fix it or
provide more focused and detailed analysis of what is wrong, so we
could fix it.  Or maybe you could point to the problematic code and
tell more details.

FWIW, I looked at ftfont.c:uni_script, and I cannot find a problem
with it; in particular looking up in char-script-table each character
of the Gujarati welcome in HELLO yields 'gujarati', so I couldn't see
any evident Emacs issue.  Or are you saying that hb_script_from_string
doesn't DTRT?  Or maybe Kaushal should upgrade to a newer version of
HarfBuzz?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 11:04:02 GMT) Full text and rfc822 format available.

Message #38 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com,
 kaushal.modi <at> gmail.com
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Fri, 14 Dec 2018 13:03:16 +0200
On Fri, Dec 14, 2018 at 12:03:32PM +0200, Eli Zaretskii wrote:
> > Date: Fri, 14 Dec 2018 09:50:56 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: Kaushal Modi <kaushal.modi <at> gmail.com>, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com
> > 
> > > Your data indicates that the m17n build performs character composition
> > > at buffer position 34, whereas the harfbuzz build does not.  The
> > > question is why.
> > 
> > See my earlier email, most likely the culprit is the broken Emacs to
> > HarfBuzz script code mapping that we discussed earlier. HarfBuzz needs
> > to know the correct script of the text to perform shaping, and it looks
> > like we are passing nonsense values for certain scripts (or rather for
> > certain scripts we are lucky that the mapping is not broken).
> 
> Thanks.
> 
> I don't yet have access to a GNU/Linux system with HarfBuzz installed,
> so I cannot myself debug it.
> 
> I hope Mohammad will be able to look into this and either fix it or
> provide more focused and detailed analysis of what is wrong, so we
> could fix it.  Or maybe you could point to the problematic code and
> tell more details.
> 
> FWIW, I looked at ftfont.c:uni_script, and I cannot find a problem
> with it; in particular looking up in char-script-table each character
> of the Gujarati welcome in HELLO yields 'gujarati', so I couldn't see
> any evident Emacs issue.  Or are you saying that hb_script_from_string
> doesn't DTRT?  Or maybe Kaushal should upgrade to a newer version of
> HarfBuzz?

There is this FIXME:

/* FIXME: from_string wants an ISO 15924 script tag here. */

As we discussed earlier, hb_script_from_string() expects ISO 15924
script tags, but char_script_table does not provide such tags (I don’t
recall what it does provide exactly). We need a way to get ISO 15924
script tags from Emacs.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 13:44:02 GMT) Full text and rfc822 format available.

Message #41 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com,
 kaushal.modi <at> gmail.com
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Fri, 14 Dec 2018 15:42:49 +0200
> Date: Fri, 14 Dec 2018 13:03:16 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: kaushal.modi <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org,
> 	far.nasiri.m <at> gmail.com
> 
> > FWIW, I looked at ftfont.c:uni_script, and I cannot find a problem
> > with it; in particular looking up in char-script-table each character
> > of the Gujarati welcome in HELLO yields 'gujarati', so I couldn't see
> > any evident Emacs issue.  Or are you saying that hb_script_from_string
> > doesn't DTRT?  Or maybe Kaushal should upgrade to a newer version of
> > HarfBuzz?
> 
> There is this FIXME:
> 
> /* FIXME: from_string wants an ISO 15924 script tag here. */
> 
> As we discussed earlier, hb_script_from_string() expects ISO 15924
> script tags, but char_script_table does not provide such tags (I don’t
> recall what it does provide exactly). We need a way to get ISO 15924
> script tags from Emacs.

Right, I forgot about that.

So you are saying that we need to generate Gujr instead of gujarati,
is that right?

Mohammad, do you need help in comping up with a solution?  There's
otf-script-alist (see fontest.el), but it goes in the opposite
direction.  We could use rassq (Frassq in C) to find the OTF script
tag by its Emacs symbol (which is returned by indexing into
Vchar_script_table), by looking in otf-script-alist.

Or maybe you prefer a seperat data structure, not limited to the OTF
tags?

Let me know if you need more help.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 15:26:02 GMT) Full text and rfc822 format available.

Message #44 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: far.nasiri.m <at> gmail.com
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org,
 kaushal.modi <at> gmail.com
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Fri, 14 Dec 2018 17:25:42 +0200
> Date: Fri, 14 Dec 2018 15:42:49 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: behdad <at> behdad.org, kaushal.modi <at> gmail.com, 33729 <at> debbugs.gnu.org,
> 	far.nasiri.m <at> gmail.com
> 
> Mohammad, do you need help in comping up with a solution?  There's
> otf-script-alist (see fontest.el), but it goes in the opposite
> direction.  We could use rassq (Frassq in C) to find the OTF script
> tag by its Emacs symbol (which is returned by indexing into
> Vchar_script_table), by looking in otf-script-alist.
> 
> Or maybe you prefer a separate data structure, not limited to the OTF
> tags?

After some thinking, my conclusion is that we should import the
ISO 15924 database from https://unicode.org/iso15924/, use a script
similar to admin/unidata/blocks.awk to generate an alist from it that
maps Emacs script names to ISO 15924 tags, and then access that alist
from uni_script to get the correct script information to Harfbuzz.

Patches implementing that are welcome.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 14 Dec 2018 22:48:01 GMT) Full text and rfc822 format available.

Message #47 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: behdad <at> behdad.org, 33729 <at> debbugs.gnu.org, far.nasiri.m <at> gmail.com,
 kaushal.modi <at> gmail.com
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 15 Dec 2018 00:47:43 +0200
On Fri, Dec 14, 2018 at 03:42:49PM +0200, Eli Zaretskii wrote:
> > Date: Fri, 14 Dec 2018 13:03:16 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: kaushal.modi <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org,
> > 	far.nasiri.m <at> gmail.com
> > 
> > > FWIW, I looked at ftfont.c:uni_script, and I cannot find a problem
> > > with it; in particular looking up in char-script-table each character
> > > of the Gujarati welcome in HELLO yields 'gujarati', so I couldn't see
> > > any evident Emacs issue.  Or are you saying that hb_script_from_string
> > > doesn't DTRT?  Or maybe Kaushal should upgrade to a newer version of
> > > HarfBuzz?
> > 
> > There is this FIXME:
> > 
> > /* FIXME: from_string wants an ISO 15924 script tag here. */
> > 
> > As we discussed earlier, hb_script_from_string() expects ISO 15924
> > script tags, but char_script_table does not provide such tags (I don’t
> > recall what it does provide exactly). We need a way to get ISO 15924
> > script tags from Emacs.
> 
> Right, I forgot about that.
> 
> So you are saying that we need to generate Gujr instead of gujarati,
> is that right?

Yes (and the equivalent for all other scripts, of course).

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 16 Dec 2018 14:49:02 GMT) Full text and rfc822 format available.

Message #50 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: behdad <at> behdad.org, Eli Zaretskii <eliz <at> gnu.org>, 33729 <at> debbugs.gnu.org,
 far.nasiri.m <at> gmail.com, kaushal.modi <at> gmail.com
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Sun, 16 Dec 2018 15:47:56 +0100
Khaled Hosny writes:
> /* FIXME: from_string wants an ISO 15924 script tag here. */
>
> As we discussed earlier, hb_script_from_string() expects ISO 15924
> script tags, but char_script_table does not provide such tags (I don’t
> recall what it does provide exactly). We need a way to get ISO 15924
> script tags from Emacs.

The same mismatch also prevents Syriac text from actually shaping.
Syriac shaping works in m17n with the required setup in
composition-function-table and using the Meltho fonts.  With Harfbuzz it
doesn't work, unless I change "syriac" to "syrc" in charscript.el, just
for testing of course.

As a success story OTOH, Mandaic, using the Noto font, works OOTB ;-)

benny




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Mon, 17 Dec 2018 00:31:02 GMT) Full text and rfc822 format available.

Message #53 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Glenn Morris <rgm <at> gnu.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Sun, 16 Dec 2018 19:30:00 -0500
Eli Zaretskii wrote:

> After some thinking, my conclusion is that we should import the
> ISO 15924 database from https://unicode.org/iso15924/, use a script
> similar to admin/unidata/blocks.awk to generate an alist from it that
> maps Emacs script names to ISO 15924 tags, and then access that alist
> from uni_script to get the correct script information to Harfbuzz.
>
> Patches implementing that are welcome.

I live to write awk scripts. I'm not 100% sure what you want, but as a
first example, the following takes
http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
as input and outputs lines of the form "(gujr . gujarati)".

The aliases are so that the RHS matches charscript.el.

If this is not right, please clarify exactly what the inputs and output
should be.

#!/usr/bin/awk -f

function name2alias(name) {
    name = tolower(name)
    if (name ~ /arabic/) return "arabic"
    else if (name ~ /aramaic/) return "aramaic"
    else if (name ~ /cypriot/) return "cypriot-syllabary"
    else if (name ~ /katakana|hiragana/) return "kana"
    else if (name ~ /myanmar/) return "burmese"
    else if (name ~ /duployan|shorthand/) return "duployan-shorthand"
    else if (name ~ /signwriting/) return "sutton-sign-writing"
    sub(/^new_/, "", name)
    sub(/_(hieroglyphs|cursive)$/, "", name)
    gsub(/_/,"-",name)
    return name
}


$1 == "sc" {
    abbrev = tolower($3)
    alias = name2alias($5)
    if (alias ~ /^inherited|common|unknown/) next
    print "(" abbrev, ".", alias ")"
}




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Mon, 17 Dec 2018 15:57:01 GMT) Full text and rfc822 format available.

Message #56 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Glenn Morris <rgm <at> gnu.org>
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Mon, 17 Dec 2018 17:55:52 +0200
> From: Glenn Morris <rgm <at> gnu.org>
> Cc: far.nasiri.m <at> gmail.com,  dr.khaled.hosny <at> gmail.com,  behdad <at> behdad.org,  33729 <at> debbugs.gnu.org,  kaushal.modi <at> gmail.com
> Date: Sun, 16 Dec 2018 19:30:00 -0500
> 
> > After some thinking, my conclusion is that we should import the
> > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > similar to admin/unidata/blocks.awk to generate an alist from it that
> > maps Emacs script names to ISO 15924 tags, and then access that alist
> > from uni_script to get the correct script information to Harfbuzz.
> >
> > Patches implementing that are welcome.
> 
> I live to write awk scripts. I'm not 100% sure what you want, but as a
> first example, the following takes
> http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
> as input and outputs lines of the form "(gujr . gujarati)".
> 
> The aliases are so that the RHS matches charscript.el.
> 
> If this is not right, please clarify exactly what the inputs and output
> should be.

Thanks.

It turns out I didn't have this figured out completely, and your
proposal forced me to dig some more into the relevant parts of Unicode
and Emacs.  I found a few additional issues and considerations; for at
least some of them I'd like to hear the opinions of the Harfbuzz
developers.

Here are the issues:

 . Contrary to my original thoughts, I now tend to think that a
   separate char-table, say char-iso159240tag-table, that maps
   character codepoints directly to the script tags, is a better
   solution:
    - it will allow a faster look up, obviously
    - the subdivision of characters into scripts, as shown in
      Unicode's Scripts.txt, is slightly different from what
      char-script-table does, so a simple mapping from Emacs scripts
      to ISO 15924 script tag will not do.  For example, many
      characters Emacs puts into 'latin' or 'symbol' scripts are in
      the Common script according to Scripts.txt, and similarly for
      the Inherited script.  I imagine this is important for
      Harfbuzz.

 . Whether to produce the character-to-script-tag mapping using the
   UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the
   canonical ISO 15924 tags from https://unicode.org/iso15924/,
   depends on whether the slight differences mentioned in
   https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter
   for Harfbuzz.  For example, ISO 15924 has separate tags for the
   Fraktur and Gaelic varieties of the Latin script: does this
   distinction matter for Harfbuzz?

 . Does Harfbuzz handle the issues mentioned in
   https://www.unicode.org/reports/tr24/#Script_Anomalies, and in
   particular the use case of decomposed characters which yield a
   different script than their precomposed variants?  This use case is
   quite common in handling of character compositions, so it's
   important to understand its implications before we decide on the
   implementation.

To summarize, unless the Harfbuzz guys advise differently, I'd prefer
processing Scripts.txt and PropertyValueAliases.txt into a list
similar to the one we produce in charscript.el, then generate a
char-table from that list.

Thanks again for working on this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Thu, 20 Dec 2018 18:59:02 GMT) Full text and rfc822 format available.

Message #59 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org,, far.nasiri.m <at> gmail.com
Cc: rgm <at> gnu.org, 33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Thu, 20 Dec 2018 20:58:06 +0200
Ping!  Could someone on the Harfbuzz team please comment on the
thoughts below?  Khaled, Mohammad, Behdad?

> Date: Mon, 17 Dec 2018 17:55:52 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org,
> 	far.nasiri.m <at> gmail.com, kaushal.modi <at> gmail.com
> 
> > From: Glenn Morris <rgm <at> gnu.org>
> > Cc: far.nasiri.m <at> gmail.com,  dr.khaled.hosny <at> gmail.com,  behdad <at> behdad.org,  33729 <at> debbugs.gnu.org,  kaushal.modi <at> gmail.com
> > Date: Sun, 16 Dec 2018 19:30:00 -0500
> > 
> > > After some thinking, my conclusion is that we should import the
> > > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > > similar to admin/unidata/blocks.awk to generate an alist from it that
> > > maps Emacs script names to ISO 15924 tags, and then access that alist
> > > from uni_script to get the correct script information to Harfbuzz.
> > >
> > > Patches implementing that are welcome.
> > 
> > I live to write awk scripts. I'm not 100% sure what you want, but as a
> > first example, the following takes
> > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
> > as input and outputs lines of the form "(gujr . gujarati)".
> > 
> > The aliases are so that the RHS matches charscript.el.
> > 
> > If this is not right, please clarify exactly what the inputs and output
> > should be.
> 
> Thanks.
> 
> It turns out I didn't have this figured out completely, and your
> proposal forced me to dig some more into the relevant parts of Unicode
> and Emacs.  I found a few additional issues and considerations; for at
> least some of them I'd like to hear the opinions of the Harfbuzz
> developers.
> 
> Here are the issues:
> 
>  . Contrary to my original thoughts, I now tend to think that a
>    separate char-table, say char-iso159240tag-table, that maps
>    character codepoints directly to the script tags, is a better
>    solution:
>     - it will allow a faster look up, obviously
>     - the subdivision of characters into scripts, as shown in
>       Unicode's Scripts.txt, is slightly different from what
>       char-script-table does, so a simple mapping from Emacs scripts
>       to ISO 15924 script tag will not do.  For example, many
>       characters Emacs puts into 'latin' or 'symbol' scripts are in
>       the Common script according to Scripts.txt, and similarly for
>       the Inherited script.  I imagine this is important for
>       Harfbuzz.
> 
>  . Whether to produce the character-to-script-tag mapping using the
>    UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the
>    canonical ISO 15924 tags from https://unicode.org/iso15924/,
>    depends on whether the slight differences mentioned in
>    https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter
>    for Harfbuzz.  For example, ISO 15924 has separate tags for the
>    Fraktur and Gaelic varieties of the Latin script: does this
>    distinction matter for Harfbuzz?
> 
>  . Does Harfbuzz handle the issues mentioned in
>    https://www.unicode.org/reports/tr24/#Script_Anomalies, and in
>    particular the use case of decomposed characters which yield a
>    different script than their precomposed variants?  This use case is
>    quite common in handling of character compositions, so it's
>    important to understand its implications before we decide on the
>    implementation.
> 
> To summarize, unless the Harfbuzz guys advise differently, I'd prefer
> processing Scripts.txt and PropertyValueAliases.txt into a list
> similar to the one we produce in charscript.el, then generate a
> char-table from that list.
> 
> Thanks again for working on this.
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Thu, 20 Dec 2018 20:47:01 GMT) Full text and rfc822 format available.

Message #62 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Behdad Esfahbod <behdad <at> behdad.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Khaled Hosny <dr.khaled.hosny <at> gmail.com>, rgm <at> gnu.org,
 kaushal.modi <at> gmail.com, Mohammad Nasirifar <far.nasiri.m <at> gmail.com>,
 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with
 Harfbuzz enabled (renders fine using m17n)
Date: Thu, 20 Dec 2018 15:45:50 -0500
[Message part 1 (text/plain, inline)]
Sounds good to me.

On Thu, Dec 20, 2018 at 1:58 PM Eli Zaretskii <eliz <at> gnu.org> wrote:

> Ping!  Could someone on the Harfbuzz team please comment on the
> thoughts below?  Khaled, Mohammad, Behdad?
>
> > Date: Mon, 17 Dec 2018 17:55:52 +0200
> > From: Eli Zaretskii <eliz <at> gnu.org>
> > Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org,
> >       far.nasiri.m <at> gmail.com, kaushal.modi <at> gmail.com
> >
> > > From: Glenn Morris <rgm <at> gnu.org>
> > > Cc: far.nasiri.m <at> gmail.com,  dr.khaled.hosny <at> gmail.com,
> behdad <at> behdad.org,  33729 <at> debbugs.gnu.org,  kaushal.modi <at> gmail.com
> > > Date: Sun, 16 Dec 2018 19:30:00 -0500
> > >
> > > > After some thinking, my conclusion is that we should import the
> > > > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > > > similar to admin/unidata/blocks.awk to generate an alist from it that
> > > > maps Emacs script names to ISO 15924 tags, and then access that alist
> > > > from uni_script to get the correct script information to Harfbuzz.
> > > >
> > > > Patches implementing that are welcome.
> > >
> > > I live to write awk scripts. I'm not 100% sure what you want, but as a
> > > first example, the following takes
> > > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
> > > as input and outputs lines of the form "(gujr . gujarati)".
> > >
> > > The aliases are so that the RHS matches charscript.el.
> > >
> > > If this is not right, please clarify exactly what the inputs and output
> > > should be.
> >
> > Thanks.
> >
> > It turns out I didn't have this figured out completely, and your
> > proposal forced me to dig some more into the relevant parts of Unicode
> > and Emacs.  I found a few additional issues and considerations; for at
> > least some of them I'd like to hear the opinions of the Harfbuzz
> > developers.
> >
> > Here are the issues:
> >
> >  . Contrary to my original thoughts, I now tend to think that a
> >    separate char-table, say char-iso159240tag-table, that maps
> >    character codepoints directly to the script tags, is a better
> >    solution:
> >     - it will allow a faster look up, obviously
> >     - the subdivision of characters into scripts, as shown in
> >       Unicode's Scripts.txt, is slightly different from what
> >       char-script-table does, so a simple mapping from Emacs scripts
> >       to ISO 15924 script tag will not do.  For example, many
> >       characters Emacs puts into 'latin' or 'symbol' scripts are in
> >       the Common script according to Scripts.txt, and similarly for
> >       the Inherited script.  I imagine this is important for
> >       Harfbuzz.
> >
> >  . Whether to produce the character-to-script-tag mapping using the
> >    UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the
> >    canonical ISO 15924 tags from https://unicode.org/iso15924/,
> >    depends on whether the slight differences mentioned in
> >    https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter
> >    for Harfbuzz.  For example, ISO 15924 has separate tags for the
> >    Fraktur and Gaelic varieties of the Latin script: does this
> >    distinction matter for Harfbuzz?
> >
> >  . Does Harfbuzz handle the issues mentioned in
> >    https://www.unicode.org/reports/tr24/#Script_Anomalies, and in
> >    particular the use case of decomposed characters which yield a
> >    different script than their precomposed variants?  This use case is
> >    quite common in handling of character compositions, so it's
> >    important to understand its implications before we decide on the
> >    implementation.
> >
> > To summarize, unless the Harfbuzz guys advise differently, I'd prefer
> > processing Scripts.txt and PropertyValueAliases.txt into a list
> > similar to the one we produce in charscript.el, then generate a
> > char-table from that list.
> >
> > Thanks again for working on this.
> >
> >
> >
> >
>


-- 
behdad
http://behdad.org/
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 08:55:02 GMT) Full text and rfc822 format available.

Message #65 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, Glenn Morris <rgm <at> gnu.org>, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 10:54:48 +0200
On Mon, Dec 17, 2018 at 05:55:52PM +0200, Eli Zaretskii wrote:
> > From: Glenn Morris <rgm <at> gnu.org>
> > Cc: far.nasiri.m <at> gmail.com,  dr.khaled.hosny <at> gmail.com,  behdad <at> behdad.org,  33729 <at> debbugs.gnu.org,  kaushal.modi <at> gmail.com
> > Date: Sun, 16 Dec 2018 19:30:00 -0500
> > 
> > > After some thinking, my conclusion is that we should import the
> > > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > > similar to admin/unidata/blocks.awk to generate an alist from it that
> > > maps Emacs script names to ISO 15924 tags, and then access that alist
> > > from uni_script to get the correct script information to Harfbuzz.
> > >
> > > Patches implementing that are welcome.
> > 
> > I live to write awk scripts. I'm not 100% sure what you want, but as a
> > first example, the following takes
> > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
> > as input and outputs lines of the form "(gujr . gujarati)".
> > 
> > The aliases are so that the RHS matches charscript.el.
> > 
> > If this is not right, please clarify exactly what the inputs and output
> > should be.
> 
> Thanks.
> 
> It turns out I didn't have this figured out completely, and your
> proposal forced me to dig some more into the relevant parts of Unicode
> and Emacs.  I found a few additional issues and considerations; for at
> least some of them I'd like to hear the opinions of the Harfbuzz
> developers.
> 
> Here are the issues:
> 
>  . Contrary to my original thoughts, I now tend to think that a
>    separate char-table, say char-iso159240tag-table, that maps
>    character codepoints directly to the script tags, is a better
>    solution:
>     - it will allow a faster look up, obviously
>     - the subdivision of characters into scripts, as shown in
>       Unicode's Scripts.txt, is slightly different from what
>       char-script-table does, so a simple mapping from Emacs scripts
>       to ISO 15924 script tag will not do.  For example, many
>       characters Emacs puts into 'latin' or 'symbol' scripts are in
>       the Common script according to Scripts.txt, and similarly for
>       the Inherited script.  I imagine this is important for
>       Harfbuzz.

Alternatively, we could just use HarfBuzz’s own built in ucdn-based
Unicode function for this. The only reason for overriding this in Emacs
was to keep HarfBuzz and Emacs Unicode support in sync, but if we are
going to duplicate the Unicode script data then better use what HarfBuzz
has.

I’m going to try this now.

>  . Whether to produce the character-to-script-tag mapping using the
>    UCD files, such as Scripts.txt and PropertyValueAliases.txt, or the
>    canonical ISO 15924 tags from https://unicode.org/iso15924/,
>    depends on whether the slight differences mentioned in
>    https://www.unicode.org/reports/tr24/#Relation_To_ISO15924 matter
>    for Harfbuzz.  For example, ISO 15924 has separate tags for the
>    Fraktur and Gaelic varieties of the Latin script: does this
>    distinction matter for Harfbuzz?

We want the UCD data.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 09:07:01 GMT) Full text and rfc822 format available.

Message #68 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, Glenn Morris <rgm <at> gnu.org>, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 11:06:44 +0200
On Sat, Dec 22, 2018 at 10:54:48AM +0200, Khaled Hosny wrote:
> On Mon, Dec 17, 2018 at 05:55:52PM +0200, Eli Zaretskii wrote:
> > > From: Glenn Morris <rgm <at> gnu.org>
> > > Cc: far.nasiri.m <at> gmail.com,  dr.khaled.hosny <at> gmail.com,  behdad <at> behdad.org,  33729 <at> debbugs.gnu.org,  kaushal.modi <at> gmail.com
> > > Date: Sun, 16 Dec 2018 19:30:00 -0500
> > > 
> > > > After some thinking, my conclusion is that we should import the
> > > > ISO 15924 database from https://unicode.org/iso15924/, use a script
> > > > similar to admin/unidata/blocks.awk to generate an alist from it that
> > > > maps Emacs script names to ISO 15924 tags, and then access that alist
> > > > from uni_script to get the correct script information to Harfbuzz.
> > > >
> > > > Patches implementing that are welcome.
> > > 
> > > I live to write awk scripts. I'm not 100% sure what you want, but as a
> > > first example, the following takes
> > > http://www.unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt
> > > as input and outputs lines of the form "(gujr . gujarati)".
> > > 
> > > The aliases are so that the RHS matches charscript.el.
> > > 
> > > If this is not right, please clarify exactly what the inputs and output
> > > should be.
> > 
> > Thanks.
> > 
> > It turns out I didn't have this figured out completely, and your
> > proposal forced me to dig some more into the relevant parts of Unicode
> > and Emacs.  I found a few additional issues and considerations; for at
> > least some of them I'd like to hear the opinions of the Harfbuzz
> > developers.
> > 
> > Here are the issues:
> > 
> >  . Contrary to my original thoughts, I now tend to think that a
> >    separate char-table, say char-iso159240tag-table, that maps
> >    character codepoints directly to the script tags, is a better
> >    solution:
> >     - it will allow a faster look up, obviously
> >     - the subdivision of characters into scripts, as shown in
> >       Unicode's Scripts.txt, is slightly different from what
> >       char-script-table does, so a simple mapping from Emacs scripts
> >       to ISO 15924 script tag will not do.  For example, many
> >       characters Emacs puts into 'latin' or 'symbol' scripts are in
> >       the Common script according to Scripts.txt, and similarly for
> >       the Inherited script.  I imagine this is important for
> >       Harfbuzz.
> 
> Alternatively, we could just use HarfBuzz’s own built in ucdn-based
> Unicode function for this. The only reason for overriding this in Emacs
> was to keep HarfBuzz and Emacs Unicode support in sync, but if we are
> going to duplicate the Unicode script data then better use what HarfBuzz
> has.
> 
> I’m going to try this now.

I pushed a commit to harfbuzz branch that I think fixes this issue now.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 10:12:02 GMT) Full text and rfc822 format available.

Message #71 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 12:11:15 +0200
> Date: Sat, 22 Dec 2018 11:06:44 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: Glenn Morris <rgm <at> gnu.org>, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> > Alternatively, we could just use HarfBuzz’s own built in ucdn-based
> > Unicode function for this. The only reason for overriding this in Emacs
> > was to keep HarfBuzz and Emacs Unicode support in sync, but if we are
> > going to duplicate the Unicode script data then better use what HarfBuzz
> > has.
> > 
> > I’m going to try this now.
> 
> I pushed a commit to harfbuzz branch that I think fixes this issue now.

Thanks.

There's a FIXME in the change you pushed (which I believe just
repeats what was already in the previous version).  Ca you tell more
about the problem we need to fix there?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 15:16:02 GMT) Full text and rfc822 format available.

Message #74 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 17:15:09 +0200
On Sat, Dec 22, 2018 at 12:11:15PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 22 Dec 2018 11:06:44 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: Glenn Morris <rgm <at> gnu.org>, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > > Alternatively, we could just use HarfBuzz’s own built in ucdn-based
> > > Unicode function for this. The only reason for overriding this in Emacs
> > > was to keep HarfBuzz and Emacs Unicode support in sync, but if we are
> > > going to duplicate the Unicode script data then better use what HarfBuzz
> > > has.
> > > 
> > > I’m going to try this now.
> > 
> > I pushed a commit to harfbuzz branch that I think fixes this issue now.
> 
> Thanks.
> 
> There's a FIXME in the change you pushed (which I believe just
> repeats what was already in the previous version).  Ca you tell more
> about the problem we need to fix there?

We need a way to get Unicode composition and decomposition for the
a given character (implementing the uni_compose and uni_decompose
functions I deleted). I recall you suggested something earlier that I
tried but couldn’t get to work, the exact detail escapes me.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 15:28:01 GMT) Full text and rfc822 format available.

Message #77 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Behdad Esfahbod <behdad <at> behdad.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: rgm <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Kaushal Modi <kaushal.modi <at> gmail.com>,
 Mohammad Nasirifar <far.nasiri.m <at> gmail.com>, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with
 Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 10:27:06 -0500
[Message part 1 (text/plain, inline)]
I suggest you enabled UCDN.

On Sat, Dec 22, 2018 at 10:15 AM Khaled Hosny <dr.khaled.hosny <at> gmail.com>
wrote:

> On Sat, Dec 22, 2018 at 12:11:15PM +0200, Eli Zaretskii wrote:
> > > Date: Sat, 22 Dec 2018 11:06:44 +0200
> > > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > > Cc: Glenn Morris <rgm <at> gnu.org>, far.nasiri.m <at> gmail.com,
> behdad <at> behdad.org,
> > >     33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > >
> > > > Alternatively, we could just use HarfBuzz’s own built in ucdn-based
> > > > Unicode function for this. The only reason for overriding this in
> Emacs
> > > > was to keep HarfBuzz and Emacs Unicode support in sync, but if we are
> > > > going to duplicate the Unicode script data then better use what
> HarfBuzz
> > > > has.
> > > >
> > > > I’m going to try this now.
> > >
> > > I pushed a commit to harfbuzz branch that I think fixes this issue now.
> >
> > Thanks.
> >
> > There's a FIXME in the change you pushed (which I believe just
> > repeats what was already in the previous version).  Ca you tell more
> > about the problem we need to fix there?
>
> We need a way to get Unicode composition and decomposition for the
> a given character (implementing the uni_compose and uni_decompose
> functions I deleted). I recall you suggested something earlier that I
> tried but couldn’t get to work, the exact detail escapes me.
>


-- 
behdad
http://behdad.org/
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 15:44:02 GMT) Full text and rfc822 format available.

Message #80 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Behdad Esfahbod <behdad <at> behdad.org>
Cc: rgm <at> gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Kaushal Modi <kaushal.modi <at> gmail.com>,
 Mohammad Nasirifar <far.nasiri.m <at> gmail.com>, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 17:42:47 +0200
I’m sub-classing the default Unicode functions, so for the callback we
don’t implement the default implementation will be used already.

On Sat, Dec 22, 2018 at 10:27:06AM -0500, Behdad Esfahbod wrote:
> I suggest you enabled UCDN.
> 
> On Sat, Dec 22, 2018 at 10:15 AM Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> wrote:
> 
> > On Sat, Dec 22, 2018 at 12:11:15PM +0200, Eli Zaretskii wrote:
> > > > Date: Sat, 22 Dec 2018 11:06:44 +0200
> > > > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > > > Cc: Glenn Morris <rgm <at> gnu.org>, far.nasiri.m <at> gmail.com,
> > behdad <at> behdad.org,
> > > >     33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > > >
> > > > > Alternatively, we could just use HarfBuzz’s own built in ucdn-based
> > > > > Unicode function for this. The only reason for overriding this in
> > Emacs
> > > > > was to keep HarfBuzz and Emacs Unicode support in sync, but if we are
> > > > > going to duplicate the Unicode script data then better use what
> > HarfBuzz
> > > > > has.
> > > > >
> > > > > I’m going to try this now.
> > > >
> > > > I pushed a commit to harfbuzz branch that I think fixes this issue now.
> > >
> > > Thanks.
> > >
> > > There's a FIXME in the change you pushed (which I believe just
> > > repeats what was already in the previous version).  Ca you tell more
> > > about the problem we need to fix there?
> >
> > We need a way to get Unicode composition and decomposition for the
> > a given character (implementing the uni_compose and uni_decompose
> > functions I deleted). I recall you suggested something earlier that I
> > tried but couldn’t get to work, the exact detail escapes me.
> >
> 
> 
> -- 
> behdad
> http://behdad.org/




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 15:44:02 GMT) Full text and rfc822 format available.

Message #83 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 17:42:43 +0200
> Date: Sat, 22 Dec 2018 17:15:09 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> > There's a FIXME in the change you pushed (which I believe just
> > repeats what was already in the previous version).  Ca you tell more
> > about the problem we need to fix there?
> 
> We need a way to get Unicode composition and decomposition for the
> a given character (implementing the uni_compose and uni_decompose
> functions I deleted).

Yes, but what does that entail?  Are these compositions and
decompositions defined by the Unicode UCD?  And how does Harfbuzz use
the results for a given character?

> I recall you suggested something earlier that I tried but couldn’t
> get to work, the exact detail escapes me.

I probably suggested using the 'decomposition' property of a
character, and perhaps als the facilities in ucs-normalize.el.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 15:50:01 GMT) Full text and rfc822 format available.

Message #86 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 17:49:45 +0200
On Sat, Dec 22, 2018 at 05:42:43PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 22 Dec 2018 17:15:09 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > > There's a FIXME in the change you pushed (which I believe just
> > > repeats what was already in the previous version).  Ca you tell more
> > > about the problem we need to fix there?
> > 
> > We need a way to get Unicode composition and decomposition for the
> > a given character (implementing the uni_compose and uni_decompose
> > functions I deleted).
> 
> Yes, but what does that entail?  Are these compositions and
> decompositions defined by the Unicode UCD?  And how does Harfbuzz use
> the results for a given character?

Yes, the standard Unicode composition and decomposition. HarfBuzz uses
these during shaping (it prefers composed form for a given sequence if
supported by the font, and falls back to decomposed form otherwise).

> > I recall you suggested something earlier that I tried but couldn’t
> > get to work, the exact detail escapes me.
> 
> I probably suggested using the 'decomposition' property of a
> character, and perhaps als the facilities in ucs-normalize.el.

How can this be done from C.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 16:35:01 GMT) Full text and rfc822 format available.

Message #89 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 18:33:52 +0200
> Date: Sat, 22 Dec 2018 17:49:45 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> Yes, the standard Unicode composition and decomposition. HarfBuzz uses
> these during shaping (it prefers composed form for a given sequence if
> supported by the font, and falls back to decomposed form otherwise).
> 
> > > I recall you suggested something earlier that I tried but couldn’t
> > > get to work, the exact detail escapes me.
> > 
> > I probably suggested using the 'decomposition' property of a
> > character, and perhaps als the facilities in ucs-normalize.el.
> 
> How can this be done from C.

There are several examples in the sources of calling Lisp from C.  As
just a random example:

	if (STRINGP (curdir))
	  val = call1 (intern ("file-remote-p"), curdir);

This calls the Lisp function file-remote-p with one argument, curdir.

If you tell me more about the arguments and the expected effects of
calling uni_compose and uni_decompose, maybe I could propose a
specific implementation.  The Harfbuzz documentation doesn't seem to
tell enough, or maybe I didn't find the right text.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 19:40:02 GMT) Full text and rfc822 format available.

Message #92 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 21:38:43 +0200
> Date: Sat, 22 Dec 2018 17:49:45 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> Yes, the standard Unicode composition and decomposition. HarfBuzz uses
> these during shaping (it prefers composed form for a given sequence if
> supported by the font, and falls back to decomposed form otherwise).

Btw, how is this problem solved in the other projects that use
Harfuzz?  Does every project need to provide this functionality, or
does Harfuzz have it built-in, like with the script tags?  If there's
built-in support for this, perhaps Emacs could just use that?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 22 Dec 2018 21:01:01 GMT) Full text and rfc822 format available.

Message #95 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 22 Dec 2018 22:59:48 +0200
On Sat, Dec 22, 2018 at 09:38:43PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 22 Dec 2018 17:49:45 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > Yes, the standard Unicode composition and decomposition. HarfBuzz uses
> > these during shaping (it prefers composed form for a given sequence if
> > supported by the font, and falls back to decomposed form otherwise).
> 
> Btw, how is this problem solved in the other projects that use
> Harfuzz?  Does every project need to provide this functionality, or
> does Harfuzz have it built-in, like with the script tags?  If there's
> built-in support for this, perhaps Emacs could just use that?

There is built-in support, and currently we are using that. I can just
remove the FIXME.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 23 Dec 2018 03:35:02 GMT) Full text and rfc822 format available.

Message #98 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sun, 23 Dec 2018 05:34:04 +0200
> Date: Sat, 22 Dec 2018 22:59:48 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> On Sat, Dec 22, 2018 at 09:38:43PM +0200, Eli Zaretskii wrote:
> > > Date: Sat, 22 Dec 2018 17:49:45 +0200
> > > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > > 
> > > Yes, the standard Unicode composition and decomposition. HarfBuzz uses
> > > these during shaping (it prefers composed form for a given sequence if
> > > supported by the font, and falls back to decomposed form otherwise).
> > 
> > Btw, how is this problem solved in the other projects that use
> > Harfuzz?  Does every project need to provide this functionality, or
> > does Harfuzz have it built-in, like with the script tags?  If there's
> > built-in support for this, perhaps Emacs could just use that?
> 
> There is built-in support, and currently we are using that. I can just
> remove the FIXME.

Are there any disadvantages in using the built-in support?  I mean,
why did you envision an Emacs-specific implementation in the first
place?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 23 Dec 2018 13:52:02 GMT) Full text and rfc822 format available.

Message #101 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sun, 23 Dec 2018 15:51:09 +0200
On Sun, Dec 23, 2018 at 05:34:04AM +0200, Eli Zaretskii wrote:
> > Date: Sat, 22 Dec 2018 22:59:48 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > On Sat, Dec 22, 2018 at 09:38:43PM +0200, Eli Zaretskii wrote:
> > > > Date: Sat, 22 Dec 2018 17:49:45 +0200
> > > > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > > > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > > > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > > > 
> > > > Yes, the standard Unicode composition and decomposition. HarfBuzz uses
> > > > these during shaping (it prefers composed form for a given sequence if
> > > > supported by the font, and falls back to decomposed form otherwise).
> > > 
> > > Btw, how is this problem solved in the other projects that use
> > > Harfuzz?  Does every project need to provide this functionality, or
> > > does Harfuzz have it built-in, like with the script tags?  If there's
> > > built-in support for this, perhaps Emacs could just use that?
> > 
> > There is built-in support, and currently we are using that. I can just
> > remove the FIXME.
> 
> Are there any disadvantages in using the built-in support?  I mean,
> why did you envision an Emacs-specific implementation in the first
> place?

I thought, but I might be mistaken, that Emacs allow changing these
character properties at runtime and someone might possibly want to use
that to change some character property (e.g. make some PUA character a
combining mark) and it would then be nice if HarfBuzz respected that. I
admit that is very niche thing if possible at all, and I’m more than
happy to let HarfBuzz use it default Unicode functions and simplify the
Emacs integration code.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 23 Dec 2018 16:02:01 GMT) Full text and rfc822 format available.

Message #104 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sun, 23 Dec 2018 18:00:58 +0200
> Date: Sun, 23 Dec 2018 15:51:09 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> > Are there any disadvantages in using the built-in support?  I mean,
> > why did you envision an Emacs-specific implementation in the first
> > place?
> 
> I thought, but I might be mistaken, that Emacs allow changing these
> character properties at runtime and someone might possibly want to use
> that to change some character property (e.g. make some PUA character a
> combining mark) and it would then be nice if HarfBuzz respected that. I
> admit that is very niche thing if possible at all, and I’m more than
> happy to let HarfBuzz use it default Unicode functions and simplify the
> Emacs integration code.

Right, I agree that we should for now leave that to HarfBuzz.  It
could be added later as an optional feature.  (I don't expect many
users to want to modify the Unicode character properties.)

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Mon, 24 Dec 2018 02:09:02 GMT) Full text and rfc822 format available.

Message #107 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Mon, 24 Dec 2018 04:08:47 +0200
On Sun, Dec 23, 2018 at 06:00:58PM +0200, Eli Zaretskii wrote:
> > Date: Sun, 23 Dec 2018 15:51:09 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > > Are there any disadvantages in using the built-in support?  I mean,
> > > why did you envision an Emacs-specific implementation in the first
> > > place?
> > 
> > I thought, but I might be mistaken, that Emacs allow changing these
> > character properties at runtime and someone might possibly want to use
> > that to change some character property (e.g. make some PUA character a
> > combining mark) and it would then be nice if HarfBuzz respected that. I
> > admit that is very niche thing if possible at all, and I’m more than
> > happy to let HarfBuzz use it default Unicode functions and simplify the
> > Emacs integration code.
> 
> Right, I agree that we should for now leave that to HarfBuzz.  It
> could be added later as an optional feature.  (I don't expect many
> users to want to modify the Unicode character properties.)

I think we are almost good now. There is only one serious FIXME left:

  /* FIXME: guess_segment_properties is BAD BAD BAD.
   * we need to get these properties with the LGSTRING. */
#if 1
  hb_buffer_guess_segment_properties (hb_buffer);
#else
  hb_buffer_set_direction (hb_buffer, XXX);
  hb_buffer_set_script (hb_buffer, XXX);
  hb_buffer_set_language (hb_buffer, XXX);
#endif

We need to know, for a given lgstring we are shaping:
* Its direction (from applying bidi algorithm). Each lgstring we are
  shaping must be of a single direction.
* Its script, possibly after applying something like:
  http://unicode.org/reports/tr24/#Common
* Its language, is Emacs allows setting text language (my understand is
  that it doesn’t). Some languages really need this for applying
  language-specfic features (Urdu digits, Serbian alternate glyphs, etc.).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Mon, 24 Dec 2018 04:14:02 GMT) Full text and rfc822 format available.

Message #110 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Kaushal Modi <kaushal.modi <at> gmail.com>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, Glenn Morris <rgm <at> gnu.org>,
 Eli Zaretskii <eliz <at> gnu.org>, Mohammad Nasirifar <far.nasiri.m <at> gmail.com>,
 Behdad Esfahbod <behdad <at> behdad.org>
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with
 Harfbuzz enabled (renders fine using m17n)
Date: Sun, 23 Dec 2018 23:12:52 -0500
[Message part 1 (text/plain, inline)]
On Sun, Dec 23, 2018 at 9:08 PM Khaled Hosny <dr.khaled.hosny <at> gmail.com>
wrote:

> I think we are almost good now.
>

Thanks for working on this!

I confirm that this particular issue related to rendering compound glyphs
in Gujarati script is fixed.

Proof: https://i.imgtc.com/XYSM5fE.png


@Eli I see that discussion related to this fix is on-going. So feel free to
mark this bug as DONE when you see fit.

Thanks all!
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Mon, 24 Dec 2018 16:12:02 GMT) Full text and rfc822 format available.

Message #113 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Mon, 24 Dec 2018 18:10:49 +0200
> Date: Mon, 24 Dec 2018 04:08:47 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> I think we are almost good now. There is only one serious FIXME left:
> 
>   /* FIXME: guess_segment_properties is BAD BAD BAD.
>    * we need to get these properties with the LGSTRING. */
> #if 1
>   hb_buffer_guess_segment_properties (hb_buffer);
> #else
>   hb_buffer_set_direction (hb_buffer, XXX);
>   hb_buffer_set_script (hb_buffer, XXX);
>   hb_buffer_set_language (hb_buffer, XXX);
> #endif
> 
> We need to know, for a given lgstring we are shaping:
> * Its direction (from applying bidi algorithm). Each lgstring we are
>   shaping must be of a single direction.

Communicating this to ftfont_shape_by_hb will need changes in a couple
of interfaces (the existing shaping engines didn't need this
information).  I will work on this soon.

> * Its script, possibly after applying something like:
>   http://unicode.org/reports/tr24/#Common

Per previous discussions, we decided to use the Harfbuzz built-in
methods for determining the script, since Emacs doesn't have this
information, and adding it will just do the same as Harfbuzz does,
i.e. find the first character whose script is not Common etc., using
the UCD database.  I think it was you who suggested to use the
Harfbuzz built-ins in this case.

> * Its language, is Emacs allows setting text language (my understand is
>   that it doesn’t). Some languages really need this for applying
>   language-specfic features (Urdu digits, Serbian alternate glyphs, etc.).

We don't currently have a language property for chunks of text, we
only have the current global language setting determined from the
locale (and there's a command to change that for Emacs, should the
user want it).  This is not really appropriate for multilingual
buffers, but we will have to use that for now, and hope that in the
future, infrastructure will be added to allow more flexible
determination of the language of each run of text.  (I see that
Harfbuzz already looks a the locale for its default language, but
since Emacs allows user control of this, however unlikely, I think
it's best to use the value Emacs uses.)  I will work on this as well.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Mon, 24 Dec 2018 17:38:01 GMT) Full text and rfc822 format available.

Message #116 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Mon, 24 Dec 2018 19:37:23 +0200
On Mon, Dec 24, 2018 at 06:10:49PM +0200, Eli Zaretskii wrote:
> > Date: Mon, 24 Dec 2018 04:08:47 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > I think we are almost good now. There is only one serious FIXME left:
> > 
> >   /* FIXME: guess_segment_properties is BAD BAD BAD.
> >    * we need to get these properties with the LGSTRING. */
> > #if 1
> >   hb_buffer_guess_segment_properties (hb_buffer);
> > #else
> >   hb_buffer_set_direction (hb_buffer, XXX);
> >   hb_buffer_set_script (hb_buffer, XXX);
> >   hb_buffer_set_language (hb_buffer, XXX);
> > #endif
> > 
> > We need to know, for a given lgstring we are shaping:
> > * Its direction (from applying bidi algorithm). Each lgstring we are
> >   shaping must be of a single direction.
> 
> Communicating this to ftfont_shape_by_hb will need changes in a couple
> of interfaces (the existing shaping engines didn't need this
> information).  I will work on this soon.

Great.

> > * Its script, possibly after applying something like:
> >   http://unicode.org/reports/tr24/#Common
> 
> Per previous discussions, we decided to use the Harfbuzz built-in
> methods for determining the script, since Emacs doesn't have this
> information, and adding it will just do the same as Harfbuzz does,
> i.e. find the first character whose script is not Common etc., using
> the UCD database.  I think it was you who suggested to use the
> Harfbuzz built-ins in this case.

The built-in HarfBuzz code is for getting the script for a given
character, but resolving characters with Common script is left to the
client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
what HarfBuzz sees during shaping is three separate chunks of text ABC,
123, DEF. The 123 part is all Common script characters and thus
hb_buffer_guess_segment_properties won’t be able to guess anything (and
based on the font and the script, this can cause rendering differences).
Emacs will have to resolve the script of Common characters before
applying bidi algorithm and pass that down to HarfBuzz.

> > * Its language, is Emacs allows setting text language (my understand is
> >   that it doesn’t). Some languages really need this for applying
> >   language-specfic features (Urdu digits, Serbian alternate glyphs, etc.).
> 
> We don't currently have a language property for chunks of text, we
> only have the current global language setting determined from the
> locale (and there's a command to change that for Emacs, should the
> user want it).  This is not really appropriate for multilingual
> buffers, but we will have to use that for now, and hope that in the
> future, infrastructure will be added to allow more flexible
> determination of the language of each run of text.  (I see that
> Harfbuzz already looks a the locale for its default language, but
> since Emacs allows user control of this, however unlikely, I think
> it's best to use the value Emacs uses.)  I will work on this as well.

Yes, better pass that from Emacs to HarfBuzz.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Mon, 24 Dec 2018 17:39:02 GMT) Full text and rfc822 format available.

Message #119 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Benjamin Riefenstahl <b.riefenstahl <at> turtle-trading.net>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: Glenn Morris <rgm <at> gnu.org>, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
 33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Mon, 24 Dec 2018 18:38:46 +0100
Khaled Hosny writes:
> I pushed a commit to harfbuzz branch that I think fixes this issue now.

I can confirm that this fixes the issue with Syriac.

Thanks,
benny




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Mon, 24 Dec 2018 18:08:01 GMT) Full text and rfc822 format available.

Message #122 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Mon, 24 Dec 2018 20:07:04 +0200
> Date: Mon, 24 Dec 2018 19:37:23 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> > Per previous discussions, we decided to use the Harfbuzz built-in
> > methods for determining the script, since Emacs doesn't have this
> > information, and adding it will just do the same as Harfbuzz does,
> > i.e. find the first character whose script is not Common etc., using
> > the UCD database.  I think it was you who suggested to use the
> > Harfbuzz built-ins in this case.
> 
> The built-in HarfBuzz code is for getting the script for a given
> character, but resolving characters with Common script is left to the
> client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
> what HarfBuzz sees during shaping is three separate chunks of text ABC,
> 123, DEF. The 123 part is all Common script characters and thus
> hb_buffer_guess_segment_properties won’t be able to guess anything (and
> based on the font and the script, this can cause rendering differences).
> Emacs will have to resolve the script of Common characters before
> applying bidi algorithm and pass that down to HarfBuzz.

I'm not sure I understand: why does HarfBuzz care that 123 was in the
middle if RTL text.  Does it need to shape 123 specially in this case?

(In general, AFAIK simple characters like 123 will not even go through
HarfBuzz, as Emacs doesn't call the shaper for characters whose entry
in composition-function-table is nil.  So I guess 123 here should
stand for some other characters, not for literal digits?  IOW, I don't
think I understand the example very well.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 29 Dec 2018 14:50:01 GMT) Full text and rfc822 format available.

Message #125 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 29 Dec 2018 16:49:23 +0200
> Date: Mon, 24 Dec 2018 19:37:23 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> On Mon, Dec 24, 2018 at 06:10:49PM +0200, Eli Zaretskii wrote:
> > > Date: Mon, 24 Dec 2018 04:08:47 +0200
> > > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > > 
> > > I think we are almost good now. There is only one serious FIXME left:
> > > 
> > >   /* FIXME: guess_segment_properties is BAD BAD BAD.
> > >    * we need to get these properties with the LGSTRING. */
> > > #if 1
> > >   hb_buffer_guess_segment_properties (hb_buffer);
> > > #else
> > >   hb_buffer_set_direction (hb_buffer, XXX);
> > >   hb_buffer_set_script (hb_buffer, XXX);
> > >   hb_buffer_set_language (hb_buffer, XXX);
> > > #endif
> > > 
> > > We need to know, for a given lgstring we are shaping:
> > > * Its direction (from applying bidi algorithm). Each lgstring we are
> > >   shaping must be of a single direction.
> > 
> > Communicating this to ftfont_shape_by_hb will need changes in a couple
> > of interfaces (the existing shaping engines didn't need this
> > information).  I will work on this soon.
> 
> Great.

Done.  Please test.  I made sure it compiles, but I couldn't actually
test the results, as I don't have access to a GNU/Linux system with
GUI display.  So it could be that I misunderstood the Harfbuzz APIs,
as I was essentially flying blind, guided only by the Harfbuzz docs.

In particularly, I hope I understood correctly the way we should leave
to Harfbuzz guess the properties not explicitly provided by the Emacs
context, both for the direction of the text and its script.

> > > * Its script, possibly after applying something like:
> > >   http://unicode.org/reports/tr24/#Common
> > 
> > Per previous discussions, we decided to use the Harfbuzz built-in
> > methods for determining the script, since Emacs doesn't have this
> > information, and adding it will just do the same as Harfbuzz does,
> > i.e. find the first character whose script is not Common etc., using
> > the UCD database.  I think it was you who suggested to use the
> > Harfbuzz built-ins in this case.
> 
> The built-in HarfBuzz code is for getting the script for a given
> character, but resolving characters with Common script is left to the
> client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
> what HarfBuzz sees during shaping is three separate chunks of text ABC,
> 123, DEF. The 123 part is all Common script characters and thus
> hb_buffer_guess_segment_properties won’t be able to guess anything (and
> based on the font and the script, this can cause rendering differences).
> Emacs will have to resolve the script of Common characters before
> applying bidi algorithm and pass that down to HarfBuzz.

See my followup questions about this.  For now, I left this aspect to
HarfBuzz.

> > > * Its language, is Emacs allows setting text language (my understand is
> > >   that it doesn’t). Some languages really need this for applying
> > >   language-specfic features (Urdu digits, Serbian alternate glyphs, etc.).
> > 
> > We don't currently have a language property for chunks of text, we
> > only have the current global language setting determined from the
> > locale (and there's a command to change that for Emacs, should the
> > user want it).  This is not really appropriate for multilingual
> > buffers, but we will have to use that for now, and hope that in the
> > future, infrastructure will be added to allow more flexible
> > determination of the language of each run of text.  (I see that
> > Harfbuzz already looks a the locale for its default language, but
> > since Emacs allows user control of this, however unlikely, I think
> > it's best to use the value Emacs uses.)  I will work on this as well.
> 
> Yes, better pass that from Emacs to HarfBuzz.

Done, but please see the FIXME I left behind.  For testing purposes,
you can change the current language like this:

  M-x set-locale-environment RET xx_YY.CODESET RET

For example:

  M-x set-locale-environment RET sr_RS.UTF-8 RET

for the Cyrillic Serbian locale.  This should change the value of
current-iso639-language to the symbol 'sr'.

Please tell if you encounter any difficulties with the code I added,
or if you need any further help.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 05 Jan 2019 20:54:02 GMT) Full text and rfc822 format available.

Message #128 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 5 Jan 2019 22:53:14 +0200
On Sat, Dec 29, 2018 at 04:49:23PM +0200, Eli Zaretskii wrote:
> > Date: Mon, 24 Dec 2018 19:37:23 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > > > We need to know, for a given lgstring we are shaping:
> > > > * Its direction (from applying bidi algorithm). Each lgstring we are
> > > >   shaping must be of a single direction.
> > > 
> > > Communicating this to ftfont_shape_by_hb will need changes in a couple
> > > of interfaces (the existing shaping engines didn't need this
> > > information).  I will work on this soon.
> > 
> > Great.
> 
> Done.  Please test.  I made sure it compiles, but I couldn't actually
> test the results, as I don't have access to a GNU/Linux system with
> GUI display.  So it could be that I misunderstood the Harfbuzz APIs,
> as I was essentially flying blind, guided only by the Harfbuzz docs.

It seems to work, but still not quite right. You seem to be passing the
paragraph direction, but what HarfBuzz needs is resolved direction of
the text (i.e. the bidi embedding level of the run). In other words, if
Emacs is going to draw this text from right to left, then HarfBuzz must
shape it in right to left direction. Both should use the same direction
all the time and HarfBuzz direction guessing should never be used (i.e.
always pass to it an explicit direction).

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 05 Jan 2019 21:05:01 GMT) Full text and rfc822 format available.

Message #131 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 5 Jan 2019 23:04:20 +0200
On Sat, Jan 05, 2019 at 10:53:14PM +0200, Khaled Hosny wrote:
> On Sat, Dec 29, 2018 at 04:49:23PM +0200, Eli Zaretskii wrote:
> > > Date: Mon, 24 Dec 2018 19:37:23 +0200
> > > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > > 
> > > > > We need to know, for a given lgstring we are shaping:
> > > > > * Its direction (from applying bidi algorithm). Each lgstring we are
> > > > >   shaping must be of a single direction.
> > > > 
> > > > Communicating this to ftfont_shape_by_hb will need changes in a couple
> > > > of interfaces (the existing shaping engines didn't need this
> > > > information).  I will work on this soon.
> > > 
> > > Great.
> > 
> > Done.  Please test.  I made sure it compiles, but I couldn't actually
> > test the results, as I don't have access to a GNU/Linux system with
> > GUI display.  So it could be that I misunderstood the Harfbuzz APIs,
> > as I was essentially flying blind, guided only by the Harfbuzz docs.
> 
> It seems to work, but still not quite right. You seem to be passing the
> paragraph direction, but what HarfBuzz needs is resolved direction of
> the text (i.e. the bidi embedding level of the run). In other words, if
> Emacs is going to draw this text from right to left, then HarfBuzz must
> shape it in right to left direction. Both should use the same direction
> all the time and HarfBuzz direction guessing should never be used (i.e.
> always pass to it an explicit direction).

I pushed a couple of commits that does this based on my limited
understanding of Emacs code, please check.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sat, 05 Jan 2019 21:16:01 GMT) Full text and rfc822 format available.

Message #134 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, rgm <at> gnu.org, kaushal.modi <at> gmail.com,
 far.nasiri.m <at> gmail.com, behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sat, 5 Jan 2019 23:15:14 +0200
On Mon, Dec 24, 2018 at 08:07:04PM +0200, Eli Zaretskii wrote:
> > Date: Mon, 24 Dec 2018 19:37:23 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > > Per previous discussions, we decided to use the Harfbuzz built-in
> > > methods for determining the script, since Emacs doesn't have this
> > > information, and adding it will just do the same as Harfbuzz does,
> > > i.e. find the first character whose script is not Common etc., using
> > > the UCD database.  I think it was you who suggested to use the
> > > Harfbuzz built-ins in this case.
> > 
> > The built-in HarfBuzz code is for getting the script for a given
> > character, but resolving characters with Common script is left to the
> > client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
> > what HarfBuzz sees during shaping is three separate chunks of text ABC,
> > 123, DEF. The 123 part is all Common script characters and thus
> > hb_buffer_guess_segment_properties won’t be able to guess anything (and
> > based on the font and the script, this can cause rendering differences).
> > Emacs will have to resolve the script of Common characters before
> > applying bidi algorithm and pass that down to HarfBuzz.
> 
> I'm not sure I understand: why does HarfBuzz care that 123 was in the
> middle if RTL text.

It doesn’t. What it cares about here is the correct script. Because 123
are in the middle of RTL text they will be shaped separately, and thus
hb_buffer_guess_segment_properties() will only see 123 and won’t to be
able to guess the correct script for them (Arabic, Hebrew, etc.,
whatever the script for the surrounding RTL text is).

The point I’m trying to make is that script detection, even in its
simplest form, needs to be done on the text as a whole not just the
portion being shaped, which makes hb_buffer_guess_segment_properties()
ill equipped for doing this as it only sees a small portion of the text
at a time.

> Does it need to shape 123 specially in this case?

Depending on the font, the digits might be shaped differently if the
script is, say Arabic, by e.g. applying script-specific substitutions to
forms more suitable for a given script.
 
> (In general, AFAIK simple characters like 123 will not even go through
> HarfBuzz, as Emacs doesn't call the shaper for characters whose entry
> in composition-function-table is nil.  So I guess 123 here should
> stand for some other characters, not for literal digits?  IOW, I don't
> think I understand the example very well.)

This is a bug then and needs to be fixed. All text should go through
HarfBuzz since even so-called “simple” character often require shaping
depending on the text and the font. If this is done for optimization,
then it should be revised to see if shaping with HarfBuzz is actually
significantly slower and if it is, find more proper ways to optimize it.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 06 Jan 2019 15:52:02 GMT) Full text and rfc822 format available.

Message #137 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sun, 06 Jan 2019 17:50:54 +0200
> Date: Sat, 5 Jan 2019 22:53:14 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: far.nasiri.m <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org
> 
> > Done.  Please test.  I made sure it compiles, but I couldn't actually
> > test the results, as I don't have access to a GNU/Linux system with
> > GUI display.  So it could be that I misunderstood the Harfbuzz APIs,
> > as I was essentially flying blind, guided only by the Harfbuzz docs.
> 
> It seems to work, but still not quite right. You seem to be passing the
> paragraph direction, but what HarfBuzz needs is resolved direction of
> the text (i.e. the bidi embedding level of the run).

It isn't the paragraph direction; at least it wasn't supposed to be
that.  The code is (or was before your changes):

      if (charpos < endpos)
	{
	  if (pdir == L2R)
	    direction = QL2R;
	  else if (pdir == R2L)
	    direction = QR2L;
	  [...]
	  cmp_it->reversed_p = 0;
	}
      else
	{
	  [...]
	  cmp_it->reversed_p = 1;
	  [...]
	  if (pdir == L2R)
	    direction = QR2L;
	  else if (pdir == R2L)
	    direction = QL2R;
	  [...]
	}

So, as you see, when the paragraph direction is L2R, normal text gets
L2R direction, while test reversed for display gets R2L, and the other
way around when the paragraph direction is R2L.  Which AFAIU is what
HarfBuzz needs, but maybe I'm missing something.

Did you actually see incorrect display with the code I wrote?  If so,
could you please show the recipes for reproducing that, preferably
with screenshots of correct and incorrect display?  I'd like to look
into that, to understand what I missed.

> HarfBuzz direction guessing should never be used (i.e.  always pass
> to it an explicit direction).

This is in general impossible (or at least very hard), since the
shaper is sometimes called from Lisp without any display context.  See
the Lisp callers of the function font-shape-gstring.  One use case is
when we want to display the composition information for a grapheme
cluster to the user, see descr-text.el (used by the "C-u C-x ="
command).  In these cases, the UBA is not invoked, and so we don't
have the direction information.

I could provide the direction information in this case by using the
directionality of the base character of the grapheme cluster, but I
figured out that HarfBuzz already does this as part of its guessing.
Doesn't it?

> I pushed a couple of commits that does this based on my limited
> understanding of Emacs code, please check.

Thanks.  Do you see any difference in the results?  If so, can you
please show the text you used and the results of shaping it with both
versions.  AFAIU, your code should produce exactly the same results,
unless I'm missing something.  (I didn't want to use the
resolved_level attribute because it is ephemeral, and might not
provide the correct value where we are using it.)

Btw, did you test both paragraph directions (controlled by the
bidi-paragraph-direction variable), and also text inside directional
override which changes its natural direction?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 06 Jan 2019 16:05:02 GMT) Full text and rfc822 format available.

Message #140 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>, Kenichi Handa <handa <at> gnu.org>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sun, 06 Jan 2019 18:03:55 +0200
> Date: Sat, 5 Jan 2019 23:15:14 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> 
> > > The built-in HarfBuzz code is for getting the script for a given
> > > character, but resolving characters with Common script is left to the
> > > client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
> > > what HarfBuzz sees during shaping is three separate chunks of text ABC,
> > > 123, DEF. The 123 part is all Common script characters and thus
> > > hb_buffer_guess_segment_properties won’t be able to guess anything (and
> > > based on the font and the script, this can cause rendering differences).
> > > Emacs will have to resolve the script of Common characters before
> > > applying bidi algorithm and pass that down to HarfBuzz.
> > 
> > I'm not sure I understand: why does HarfBuzz care that 123 was in the
> > middle if RTL text.
> 
> It doesn’t. What it cares about here is the correct script. Because 123
> are in the middle of RTL text they will be shaped separately, and thus
> hb_buffer_guess_segment_properties() will only see 123 and won’t to be
> able to guess the correct script for them (Arabic, Hebrew, etc.,
> whatever the script for the surrounding RTL text is).

That's what I was asking: why it's important for HarfBuzz to know that
123 should be shaped for the Arabic script?

> Depending on the font, the digits might be shaped differently if the
> script is, say Arabic, by e.g. applying script-specific substitutions to
> forms more suitable for a given script.

I guess this is what I'm missing, then: these script-specific
substitutions.  Can you elaborate on that, or point to some place
where these substitutions are described in detail?

> > (In general, AFAIK simple characters like 123 will not even go through
> > HarfBuzz, as Emacs doesn't call the shaper for characters whose entry
> > in composition-function-table is nil.  So I guess 123 here should
> > stand for some other characters, not for literal digits?  IOW, I don't
> > think I understand the example very well.)
> 
> This is a bug then and needs to be fixed. All text should go through
> HarfBuzz since even so-called “simple” character often require shaping
> depending on the text and the font. If this is done for optimization,
> then it should be revised to see if shaping with HarfBuzz is actually
> significantly slower and if it is, find more proper ways to optimize it.

(Adding Handa-san to the discussion, in the hope that he could comment
on the issue.)

I think running all text through a shaper might be prohibitively
expensive, because the shaper is called through Lisp code (see
composite.el), and we decide which chunk of text to pass to the shaper
using regexp search.  See the various files under lisp/language/ which
set up portions of composition-function-table as appropriate for each
language that needs it.

So I think we should identify all the cases where "simple" characters
surrounded by, or adjacent to, "non-simple" ones need to be passed to
a shaper, and add the necessary regular expressions to the data
structures in lisp/languages/.  Can you describe these cases, or point
me to a place where I can find the relevant info?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 06 Jan 2019 17:55:01 GMT) Full text and rfc822 format available.

Message #143 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sun, 06 Jan 2019 19:54:24 +0200
> Date: Sat, 5 Jan 2019 23:04:20 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: far.nasiri.m <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org
> 
> I pushed a couple of commits that does this based on my limited
> understanding of Emacs code, please check.

Can you explain why you moved the call to
hb_buffer_guess_segment_properties _after_ the code which sets some of
the properties?  I cannot find anything about that in the HarfBuzz
documentation.  Is this because guessing the unset properties can
benefit from knowing the properties which _are_ set, such as the
direction?

I did it the other way around, because my mental model was: first set
the defaults, then override them where better info is available.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 27 Jan 2019 17:11:02 GMT) Full text and rfc822 format available.

Message #146 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Sun, 27 Jan 2019 19:09:53 +0200
> Date: Sat, 5 Jan 2019 22:53:14 +0200
> From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> Cc: far.nasiri.m <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org
> 
> > Done.  Please test.  I made sure it compiles, but I couldn't actually
> > test the results, as I don't have access to a GNU/Linux system with
> > GUI display.  So it could be that I misunderstood the Harfbuzz APIs,
> > as I was essentially flying blind, guided only by the Harfbuzz docs.
> 
> It seems to work, but still not quite right. You seem to be passing the
> paragraph direction, but what HarfBuzz needs is resolved direction of
> the text (i.e. the bidi embedding level of the run). In other words, if
> Emacs is going to draw this text from right to left, then HarfBuzz must
> shape it in right to left direction. Both should use the same direction
> all the time and HarfBuzz direction guessing should never be used (i.e.
> always pass to it an explicit direction).

In response to that, I wrote:

   It isn't the paragraph direction; at least it wasn't supposed to be
   that.  The code is (or was before your changes):

	 if (charpos < endpos)
	   {
	     if (pdir == L2R)
	       direction = QL2R;
	     else if (pdir == R2L)
	       direction = QR2L;
	     [...]
	     cmp_it->reversed_p = 0;
	   }
	 else
	   {
	     [...]
	     cmp_it->reversed_p = 1;
	     [...]
	     if (pdir == L2R)
	       direction = QR2L;
	     else if (pdir == R2L)
	       direction = QL2R;
	     [...]
	   }

   So, as you see, when the paragraph direction is L2R, normal text gets
   L2R direction, while test reversed for display gets R2L, and the other
   way around when the paragraph direction is R2L.  Which AFAIU is what
   HarfBuzz needs, but maybe I'm missing something.

   Did you actually see incorrect display with the code I wrote?  If so,
   could you please show the recipes for reproducing that, preferably
   with screenshots of correct and incorrect display?  I'd like to look
   into that, to understand what I missed.

   > HarfBuzz direction guessing should never be used (i.e.  always pass
   > to it an explicit direction).

   This is in general impossible (or at least very hard), since the
   shaper is sometimes called from Lisp without any display context.  See
   the Lisp callers of the function font-shape-gstring.  One use case is
   when we want to display the composition information for a grapheme
   cluster to the user, see descr-text.el (used by the "C-u C-x ="
   command).  In these cases, the UBA is not invoked, and so we don't
   have the direction information.

   I could provide the direction information in this case by using the
   directionality of the base character of the grapheme cluster, but I
   figured out that HarfBuzz already does this as part of its guessing.
   Doesn't it?

   > I pushed a couple of commits that does this based on my limited
   > understanding of Emacs code, please check.

   Thanks.  Do you see any difference in the results?  If so, can you
   please show the text you used and the results of shaping it with both
   versions.  AFAIU, your code should produce exactly the same results,
   unless I'm missing something.  (I didn't want to use the
   resolved_level attribute because it is ephemeral, and might not
   provide the correct value where we are using it.)

   Btw, did you test both paragraph directions (controlled by the
   bidi-paragraph-direction variable), and also text inside directional
   override which changes its natural direction?

Could you please respond and answer the few questions I asked?  I'd
like us to continue working on the branch.

TIA




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 27 Jan 2019 17:13:01 GMT) Full text and rfc822 format available.

Message #149 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dr.khaled.hosny <at> gmail.com
Cc: 33729 <at> debbugs.gnu.org, handa <at> gnu.org, far.nasiri.m <at> gmail.com,
 behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Sun, 27 Jan 2019 19:12:04 +0200
Could you please respond to the below as well?

> Date: Sun, 06 Jan 2019 18:03:55 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
> 
> > Date: Sat, 5 Jan 2019 23:15:14 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: rgm <at> gnu.org, far.nasiri.m <at> gmail.com, behdad <at> behdad.org,
> > 	33729 <at> debbugs.gnu.org, kaushal.modi <at> gmail.com
> > 
> > > > The built-in HarfBuzz code is for getting the script for a given
> > > > character, but resolving characters with Common script is left to the
> > > > client. Suppose you have this string (upper case for RTL) ABC 123 DEF,
> > > > what HarfBuzz sees during shaping is three separate chunks of text ABC,
> > > > 123, DEF. The 123 part is all Common script characters and thus
> > > > hb_buffer_guess_segment_properties won’t be able to guess anything (and
> > > > based on the font and the script, this can cause rendering differences).
> > > > Emacs will have to resolve the script of Common characters before
> > > > applying bidi algorithm and pass that down to HarfBuzz.
> > > 
> > > I'm not sure I understand: why does HarfBuzz care that 123 was in the
> > > middle if RTL text.
> > 
> > It doesn’t. What it cares about here is the correct script. Because 123
> > are in the middle of RTL text they will be shaped separately, and thus
> > hb_buffer_guess_segment_properties() will only see 123 and won’t to be
> > able to guess the correct script for them (Arabic, Hebrew, etc.,
> > whatever the script for the surrounding RTL text is).
> 
> That's what I was asking: why it's important for HarfBuzz to know that
> 123 should be shaped for the Arabic script?
> 
> > Depending on the font, the digits might be shaped differently if the
> > script is, say Arabic, by e.g. applying script-specific substitutions to
> > forms more suitable for a given script.
> 
> I guess this is what I'm missing, then: these script-specific
> substitutions.  Can you elaborate on that, or point to some place
> where these substitutions are described in detail?
> 
> > > (In general, AFAIK simple characters like 123 will not even go through
> > > HarfBuzz, as Emacs doesn't call the shaper for characters whose entry
> > > in composition-function-table is nil.  So I guess 123 here should
> > > stand for some other characters, not for literal digits?  IOW, I don't
> > > think I understand the example very well.)
> > 
> > This is a bug then and needs to be fixed. All text should go through
> > HarfBuzz since even so-called “simple” character often require shaping
> > depending on the text and the font. If this is done for optimization,
> > then it should be revised to see if shaping with HarfBuzz is actually
> > significantly slower and if it is, find more proper ways to optimize it.
> 
> (Adding Handa-san to the discussion, in the hope that he could comment
> on the issue.)
> 
> I think running all text through a shaper might be prohibitively
> expensive, because the shaper is called through Lisp code (see
> composite.el), and we decide which chunk of text to pass to the shaper
> using regexp search.  See the various files under lisp/language/ which
> set up portions of composition-function-table as appropriate for each
> language that needs it.
> 
> So I think we should identify all the cases where "simple" characters
> surrounded by, or adjacent to, "non-simple" ones need to be passed to
> a shaper, and add the necessary regular expressions to the data
> structures in lisp/languages/.  Can you describe these cases, or point
> me to a place where I can find the relevant info?
> 
> Thanks.
> 
> 
> 
> 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Sun, 27 Jan 2019 17:14:02 GMT) Full text and rfc822 format available.

Message #152 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dr.khaled.hosny <at> gmail.com
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50;
 Partial glyphs not rendered for Gujarati with Harfbuzz enabled
 (renders fine using m17n)
Date: Sun, 27 Jan 2019 19:12:45 +0200
> Date: Sun, 06 Jan 2019 19:54:24 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
> 
> > Date: Sat, 5 Jan 2019 23:04:20 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: far.nasiri.m <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org
> > 
> > I pushed a couple of commits that does this based on my limited
> > understanding of Emacs code, please check.
> 
> Can you explain why you moved the call to
> hb_buffer_guess_segment_properties _after_ the code which sets some of
> the properties?  I cannot find anything about that in the HarfBuzz
> documentation.  Is this because guessing the unset properties can
> benefit from knowing the properties which _are_ set, such as the
> direction?
> 
> I did it the other way around, because my mental model was: first set
> the defaults, then override them where better info is available.
> 
> Thanks.

Please respond.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Tue, 29 Jan 2019 22:27:03 GMT) Full text and rfc822 format available.

Message #155 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 33729 <at> debbugs.gnu.org, handa <at> gnu.org, far.nasiri.m <at> gmail.com,
 behdad <at> behdad.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Wed, 30 Jan 2019 00:25:36 +0200
On Sun, Jan 27, 2019 at 07:12:04PM +0200, Eli Zaretskii wrote:
> Could you please respond to the below as well?

I have no time for angering these questions any more, sorry. Please feel
free to do what you find sensible.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Tue, 29 Jan 2019 22:30:03 GMT) Full text and rfc822 format available.

Message #158 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Wed, 30 Jan 2019 00:29:03 +0200
On Sun, Jan 06, 2019 at 05:50:54PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 5 Jan 2019 22:53:14 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > I pushed a couple of commits that does this based on my limited
> > understanding of Emacs code, please check.
> 
> Thanks.  Do you see any difference in the results?

Strings with forced direction (e.g. Arabic with LRO) showed difference.
Without my change they were shaped RTL by drawn LTR, with my change
shaping and drawing used LTR direction. Please feel free to revert that
change if you think it is incorrect.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Tue, 29 Jan 2019 22:35:01 GMT) Full text and rfc822 format available.

Message #161 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: behdad <at> behdad.org, far.nasiri.m <at> gmail.com, 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Wed, 30 Jan 2019 00:33:30 +0200
On Sun, Jan 06, 2019 at 07:54:24PM +0200, Eli Zaretskii wrote:
> > Date: Sat, 5 Jan 2019 23:04:20 +0200
> > From: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
> > Cc: far.nasiri.m <at> gmail.com, behdad <at> behdad.org, 33729 <at> debbugs.gnu.org
> > 
> > I pushed a couple of commits that does this based on my limited
> > understanding of Emacs code, please check.
> 
> Can you explain why you moved the call to
> hb_buffer_guess_segment_properties _after_ the code which sets some of
> the properties?  I cannot find anything about that in the HarfBuzz
> documentation.  Is this because guessing the unset properties can
> benefit from knowing the properties which _are_ set, such as the
> direction?

hb_buffer_guess_segment_properties() won’t guess set properties, so moving
it last was to avoid wasting time guessing properties that we will
override later anyway.

> I did it the other way around, because my mental model was: first set
> the defaults, then override them where better info is available.

hb_buffer_guess_segment_properties() is not for setting defaults (there
is no such thing as default buffer properties in HarfBuzz working
model), it is a kind of quick and dirty hack and production code should
not use it.

Regards,
Khaled




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33729; Package emacs. (Fri, 29 Apr 2022 12:49:01 GMT) Full text and rfc822 format available.

Message #164 received at 33729 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Khaled Hosny <dr.khaled.hosny <at> gmail.com>
Cc: behdad <at> behdad.org, Eli Zaretskii <eliz <at> gnu.org>, far.nasiri.m <at> gmail.com,
 33729 <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Fri, 29 Apr 2022 14:47:55 +0200
Khaled Hosny <dr.khaled.hosny <at> gmail.com> writes:

>> Thanks.  Do you see any difference in the results?
>
> Strings with forced direction (e.g. Arabic with LRO) showed difference.
> Without my change they were shaped RTL by drawn LTR, with my change
> shaping and drawing used LTR direction. Please feel free to revert that
> change if you think it is incorrect.

(I'm going through old bug reports that unfortunately weren't resolved
at the time.)

Skimming this long bug report, it seems like the fixes Khaled pushed
fixed the reported issue (but I may well be misreading).

Eli, is there anything more to do here?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Fri, 29 Apr 2022 13:25:02 GMT) Full text and rfc822 format available.

Notification sent to Kaushal Modi <kaushal.modi <at> gmail.com>:
bug acknowledged by developer. (Fri, 29 Apr 2022 13:25:02 GMT) Full text and rfc822 format available.

Message #169 received at 33729-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: dr.khaled.hosny <at> gmail.com, behdad <at> behdad.org, far.nasiri.m <at> gmail.com,
 33729-done <at> debbugs.gnu.org
Subject: Re: bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati
 with Harfbuzz enabled (renders fine using m17n)
Date: Fri, 29 Apr 2022 16:24:05 +0300
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  behdad <at> behdad.org,
>   far.nasiri.m <at> gmail.com,  33729 <at> debbugs.gnu.org
> Date: Fri, 29 Apr 2022 14:47:55 +0200
> 
> Khaled Hosny <dr.khaled.hosny <at> gmail.com> writes:
> 
> >> Thanks.  Do you see any difference in the results?
> >
> > Strings with forced direction (e.g. Arabic with LRO) showed difference.
> > Without my change they were shaped RTL by drawn LTR, with my change
> > shaping and drawing used LTR direction. Please feel free to revert that
> > change if you think it is incorrect.
> 
> (I'm going through old bug reports that unfortunately weren't resolved
> at the time.)
> 
> Skimming this long bug report, it seems like the fixes Khaled pushed
> fixed the reported issue (but I may well be misreading).
> 
> Eli, is there anything more to do here?

No.  The original problem was fixed, and a couple of followup issues
were also fixed.

So I'm closing this bug.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 28 May 2022 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 305 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.