GNU bug report logs -
#68993
treesitter support for forward-sexp-default-function
Previous Next
Reported by: Juri Linkov <juri <at> linkov.net>
Date: Thu, 8 Feb 2024 17:42:01 UTC
Severity: normal
Fixed in version 30.0.50
Done: Juri Linkov <juri <at> linkov.net>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 68993 in the body.
You can then email your comments to 68993 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
casouri <at> gmail.com, bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Thu, 08 Feb 2024 17:42:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Juri Linkov <juri <at> linkov.net>
:
New bug report received and forwarded. Copy sent to
casouri <at> gmail.com, bug-gnu-emacs <at> gnu.org
.
(Thu, 08 Feb 2024 17:42:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
'treesit-forward-sentence' supports the node type 'text',
and for matching nodes it uses the fallback to
'forward-sentence-default-function'.
This patch does exactly the same for 'treesit-forward-sexp':
for nodes that match a new node type 'comment',
it uses the fallback to the new function
'forward-sexp-default-function'.
[forward-sexp-default-function.patch (text/x-diff, inline)]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 82b2f97b4a5..284c4915f3a 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2137,7 +2137,10 @@ treesit-forward-sexp
(interactive "^p")
(let ((arg (or arg 1))
(pred (or treesit-sexp-type-regexp 'sexp)))
- (or (if (> arg 0)
+ (or (when (treesit-node-match-p (treesit-node-at (point)) 'comment t)
+ (funcall #'forward-sexp-default-function arg)
+ t)
+ (if (> arg 0)
(treesit-end-of-thing pred (abs arg) 'restricted)
(treesit-beginning-of-thing pred (abs arg) 'restricted))
;; If we couldn't move, we should signal an error and report
diff --git a/lisp/emacs-lisp/lisp.el b/lisp/emacs-lisp/lisp.el
index 4b722b4e9a7..d3c3bf55de3 100644
--- a/lisp/emacs-lisp/lisp.el
+++ b/lisp/emacs-lisp/lisp.el
@@ -45,7 +45,12 @@ parens-require-spaces
:type 'boolean
:group 'lisp)
-(defvar forward-sexp-function nil
+(defun forward-sexp-default-function (&optional arg)
+ "Default function for `forward-sexp-function'."
+ (goto-char (or (scan-sexps (point) arg) (buffer-end arg)))
+ (if (< arg 0) (backward-prefix-chars)))
+
+(defvar forward-sexp-function #'forward-sexp-default-function
;; FIXME:
;; - for some uses, we may want a "sexp-only" version, which only
;; jumps over a well-formed sexp, rather than some dwimish thing
@@ -74,10 +79,7 @@ forward-sexp
"No next sexp"
"No previous sexp"))))
(or arg (setq arg 1))
- (if forward-sexp-function
- (funcall forward-sexp-function arg)
- (goto-char (or (scan-sexps (point) arg) (buffer-end arg)))
- (if (< arg 0) (backward-prefix-chars)))))
+ (funcall forward-sexp-function arg)))
(defun backward-sexp (&optional arg interactive)
"Move backward across one balanced expression (sexp).
[Message part 3 (text/plain, inline)]
Maybe the node type 'comment' is not the best name,
but it was intended to allow using the default function
to be able to move with 'M-C-f' in the comments and strings
there tree-sitter has no information.
It makes sense to support the default movement with 'M-C-f'
in the comments and strings of all ts modes. The second patch
shows how this could be achieved by adding the default
'comment' match to 'treesit-thing-settings' of all modes.
Or maybe this should be customizable?
[treesit-major-mode-setup.patch (text/x-diff, inline)]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 82b2f97b4a5..284c4915f3a 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -3054,6 +3057,18 @@ treesit-major-mode-setup
(setq-local outline-search-function #'treesit-outline-search
outline-level #'treesit-outline-level))
+ (dolist (parser (treesit-parser-list))
+ (let ((language (treesit-parser-language parser))
+ (comment (regexp-opt '("comment" "string" "string_content"))))
+ (unless (treesit-thing-defined-p 'comment language)
+ (if-let ((l (alist-get language treesit-thing-settings)))
+ (progn
+ (setf (alist-get 'comment l) (list comment))
+ (setf (alist-get language treesit-thing-settings) l))
+ (setq-local treesit-thing-settings
+ (append `((,language (comment ,comment)))
+ treesit-thing-settings))))))
+
;; Remove existing local parsers.
(dolist (ov (overlays-in (point-min) (point-max)))
(when-let ((parser (overlay-get ov 'treesit-parser)))
[Message part 5 (text/plain, inline)]
The third patch demonstrates how it's possible to close bug#67036
that was impossible to fix without more general changes in treesit.el.
The problem is that e.g. Ruby parser to such text:
hash[:key]
produces such syntax tree:
(element_reference object: (identifier) [ (simple_symbol) ])
so when point is on [ then 'M-C-f' can't move to ].
This is fixed now by the third patch:
[ruby-ts-mode.patch (text/x-diff, inline)]
diff --git a/lisp/progmodes/ruby-ts-mode.el b/lisp/progmodes/ruby-ts-mode.el
index 598eaa461ff..4d0ae2e9303 100644
--- a/lisp/progmodes/ruby-ts-mode.el
+++ b/lisp/progmodes/ruby-ts-mode.el
@@ -1170,7 +1170,20 @@ ruby-ts-mode
"global_variable"
)
eol)
- #'ruby-ts--sexp-p)))))
+ #'ruby-ts--sexp-p))
+ (comment ,(lambda (node)
+ (or (member (treesit-node-type node)
+ '("comment" "string_content"))
+ (and (member (treesit-node-text node)
+ '("[" "]"))
+ (equal (treesit-node-type
+ (treesit-node-parent node))
+ "element_reference"))
+ (and (member (treesit-node-text node)
+ '("#{" "}"))
+ (equal (treesit-node-type
+ (treesit-node-parent node))
+ "interpolation"))))))))
;; AFAIK, Ruby can not nest methods
(setq-local treesit-defun-prefer-top-level nil)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Sat, 10 Feb 2024 17:50:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 68993 <at> debbugs.gnu.org (full text, mbox):
Hi Yuan,
Do you think 'comment' is a suitable name for 'treesit-forward-sexp'?
I'm unsure even if 'text' is a good name for 'treesit-forward-sentence'.
But at least these two should be consistent with each another.
Maybe better to add the prefix 'default-' to both?
This will hint that the default function is used.
Then 'treesit-forward-sentence' will support types
'sentence' and 'default-sentence'.
And 'treesit-forward-sexp' will support types
'sexp' and 'default-sexp'.
> @@ -2137,7 +2137,10 @@ treesit-forward-sexp
> (interactive "^p")
> (let ((arg (or arg 1))
> (pred (or treesit-sexp-type-regexp 'sexp)))
> - (or (if (> arg 0)
> + (or (when (treesit-node-match-p (treesit-node-at (point)) 'comment t)
> + (funcall #'forward-sexp-default-function arg)
> + t)
> + (if (> arg 0)
> (treesit-end-of-thing pred (abs arg) 'restricted)
> (treesit-beginning-of-thing pred (abs arg) 'restricted))
> ;; If we couldn't move, we should signal an error and report
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Mon, 12 Feb 2024 01:29:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 68993 <at> debbugs.gnu.org (full text, mbox):
> On Feb 10, 2024, at 9:46 AM, Juri Linkov <juri <at> linkov.net> wrote:
>
> Hi Yuan,
>
> Do you think 'comment' is a suitable name for 'treesit-forward-sexp'?
> I'm unsure even if 'text' is a good name for 'treesit-forward-sentence'.
> But at least these two should be consistent with each another.
>
> Maybe better to add the prefix 'default-' to both?
> This will hint that the default function is used.
>
> Then 'treesit-forward-sentence' will support types
> 'sentence' and 'default-sentence'.
> And 'treesit-forward-sexp' will support types
> 'sexp' and 'default-sexp'.
First there’s the “text” definition, then treesit-forward-sentence uses it as a heuristic to get better results in comments and strings, rather than the other way around. So for me it doesn’t make much sense to say if “text” is good or bad name for treesit-forward-sentence—it’s not for treesit-forward-sentence to start with.
My suggestion would be for both treesit-forward-sentence and -sexp to use “text” for their heuristic. If someone wants more customized behavior, they can always write a custom forward-sentence/sexp function.
Yuan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Mon, 12 Feb 2024 01:44:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 68993 <at> debbugs.gnu.org (full text, mbox):
> On Feb 8, 2024, at 9:38 AM, Juri Linkov <juri <at> linkov.net> wrote:
>
> 'treesit-forward-sentence' supports the node type 'text',
> and for matching nodes it uses the fallback to
> 'forward-sentence-default-function'.
>
> This patch does exactly the same for 'treesit-forward-sexp':
> for nodes that match a new node type 'comment',
> it uses the fallback to the new function
> 'forward-sexp-default-function'.
>
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 82b2f97b4a5..284c4915f3a 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -2137,7 +2137,10 @@ treesit-forward-sexp
> (interactive "^p")
> (let ((arg (or arg 1))
> (pred (or treesit-sexp-type-regexp 'sexp)))
> - (or (if (> arg 0)
> + (or (when (treesit-node-match-p (treesit-node-at (point)) 'comment t)
> + (funcall #'forward-sexp-default-function arg)
> + t)
> + (if (> arg 0)
> (treesit-end-of-thing pred (abs arg) 'restricted)
> (treesit-beginning-of-thing pred (abs arg) 'restricted))
> ;; If we couldn't move, we should signal an error and report
> diff --git a/lisp/emacs-lisp/lisp.el b/lisp/emacs-lisp/lisp.el
> index 4b722b4e9a7..d3c3bf55de3 100644
> --- a/lisp/emacs-lisp/lisp.el
> +++ b/lisp/emacs-lisp/lisp.el
> @@ -45,7 +45,12 @@ parens-require-spaces
> :type 'boolean
> :group 'lisp)
>
> -(defvar forward-sexp-function nil
> +(defun forward-sexp-default-function (&optional arg)
> + "Default function for `forward-sexp-function'."
> + (goto-char (or (scan-sexps (point) arg) (buffer-end arg)))
> + (if (< arg 0) (backward-prefix-chars)))
> +
> +(defvar forward-sexp-function #'forward-sexp-default-function
> ;; FIXME:
> ;; - for some uses, we may want a "sexp-only" version, which only
> ;; jumps over a well-formed sexp, rather than some dwimish thing
> @@ -74,10 +79,7 @@ forward-sexp
> "No next sexp"
> "No previous sexp"))))
> (or arg (setq arg 1))
> - (if forward-sexp-function
> - (funcall forward-sexp-function arg)
> - (goto-char (or (scan-sexps (point) arg) (buffer-end arg)))
> - (if (< arg 0) (backward-prefix-chars)))))
> + (funcall forward-sexp-function arg)))
>
> (defun backward-sexp (&optional arg interactive)
> "Move backward across one balanced expression (sexp).
>
> Maybe the node type 'comment' is not the best name,
> but it was intended to allow using the default function
> to be able to move with 'M-C-f' in the comments and strings
> there tree-sitter has no information.
>
> It makes sense to support the default movement with 'M-C-f'
> in the comments and strings of all ts modes. The second patch
> shows how this could be achieved by adding the default
> 'comment' match to 'treesit-thing-settings' of all modes.
> Or maybe this should be customizable?
I think treesit-thing-settings is something we want to left for major mode’s to set. They’ll need to define other “things” in treesit-thing-settings anyway. Sure, it’s nice if we can set a few definitions automatically, but I don’t think the gain is worth that much; OTOH, it’s nice to have clear boundaries, and minimizes the possibility of confusion.
>
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 82b2f97b4a5..284c4915f3a 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -3054,6 +3057,18 @@ treesit-major-mode-setup
> (setq-local outline-search-function #'treesit-outline-search
> outline-level #'treesit-outline-level))
>
> + (dolist (parser (treesit-parser-list))
> + (let ((language (treesit-parser-language parser))
> + (comment (regexp-opt '("comment" "string" "string_content"))))
> + (unless (treesit-thing-defined-p 'comment language)
> + (if-let ((l (alist-get language treesit-thing-settings)))
> + (progn
> + (setf (alist-get 'comment l) (list comment))
> + (setf (alist-get language treesit-thing-settings) l))
> + (setq-local treesit-thing-settings
> + (append `((,language (comment ,comment)))
> + treesit-thing-settings))))))
> +
> ;; Remove existing local parsers.
> (dolist (ov (overlays-in (point-min) (point-max)))
> (when-let ((parser (overlay-get ov 'treesit-parser)))
>
> The third patch demonstrates how it's possible to close bug#67036
> that was impossible to fix without more general changes in treesit.el.
>
> The problem is that e.g. Ruby parser to such text:
>
> hash[:key]
>
> produces such syntax tree:
>
> (element_reference object: (identifier) [ (simple_symbol) ])
>
> so when point is on [ then 'M-C-f' can't move to ].
>
> This is fixed now by the third patch:
>
> diff --git a/lisp/progmodes/ruby-ts-mode.el b/lisp/progmodes/ruby-ts-mode.el
> index 598eaa461ff..4d0ae2e9303 100644
> --- a/lisp/progmodes/ruby-ts-mode.el
> +++ b/lisp/progmodes/ruby-ts-mode.el
> @@ -1170,7 +1170,20 @@ ruby-ts-mode
> "global_variable"
> )
> eol)
> - #'ruby-ts--sexp-p)))))
> + #'ruby-ts--sexp-p))
> + (comment ,(lambda (node)
> + (or (member (treesit-node-type node)
> + '("comment" "string_content"))
> + (and (member (treesit-node-text node)
> + '("[" "]"))
> + (equal (treesit-node-type
> + (treesit-node-parent node))
> + "element_reference"))
> + (and (member (treesit-node-text node)
> + '("#{" "}"))
> + (equal (treesit-node-type
> + (treesit-node-parent node))
> + "interpolation"))))))))
>
> ;; AFAIK, Ruby can not nest methods
> (setq-local treesit-defun-prefer-top-level nil)
IIUC, this doesn’t look like a good idea: you don’t want to mark something that’s not comment as comment. In the future, other packages will start using these thing definitions, and I’m sure you don’t want them consider regular code as comments.
For the specific problem you described, maybe the change made in #68899 can help?
Yuan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Mon, 12 Feb 2024 18:45:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 68993 <at> debbugs.gnu.org (full text, mbox):
>> The problem is that e.g. Ruby parser to such text:
>>
>> hash[:key]
>>
>> produces such syntax tree:
>>
>> (element_reference object: (identifier) [ (simple_symbol) ])
>>
>> so when point is on [ then 'C-M-f' can't move to ].
>>
>> This is fixed now by the third patch:
>> @@ -1170,7 +1170,20 @@ ruby-ts-mode
>> + (comment ,(lambda (node)
>> + (or (member (treesit-node-type node)
>> + '("comment" "string_content"))
>> + (and (member (treesit-node-text node)
>> + '("[" "]"))
>> + (equal (treesit-node-type
>> + (treesit-node-parent node))
>> + "element_reference"))
>> + (and (member (treesit-node-text node)
>> + '("#{" "}"))
>> + (equal (treesit-node-type
>> + (treesit-node-parent node))
>> + "interpolation"))))))))
>
> IIUC, this doesn’t look like a good idea: you don’t want to mark
> something that’s not comment as comment. In the future, other packages
> will start using these thing definitions, and I’m sure you don’t want
> them consider regular code as comments.
>
> For the specific problem you described, maybe the change made in #68899 can help?
bug#68899 can't help because it only fixes 'C-M-f' to move point to the
end of the current symbol.
Whereas for the problem above we need a predicate that will instruct
'treesit-forward-sexp' to fall back to 'forward-sexp-default-function'.
Clearly neither the name 'text' nor 'comment' would be suitable
for such usage. Maybe it's possible to find a different name?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Wed, 10 Apr 2024 18:11:05 GMT)
Full text and
rfc822 format available.
Message #20 received at 68993 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
> My suggestion would be for both treesit-forward-sentence and -sexp to
> use “text” for their heuristic. If someone wants more customized
> behavior, they can always write a custom forward-sentence/sexp function.
Thanks for the suggestion. So here is the patch that implements this:
[treesit-forward-sexp-text.patch (text/x-diff, inline)]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 1443162f79c..62d797513fc 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2145,7 +2145,10 @@ treesit-forward-sexp
(interactive "^p")
(let ((arg (or arg 1))
(pred (or treesit-sexp-type-regexp 'sexp)))
- (or (if (> arg 0)
+ (or (when (treesit-node-match-p (treesit-node-at (point)) 'text t)
+ (funcall #'forward-sexp-default-function arg)
+ t)
+ (if (> arg 0)
(treesit-end-of-thing pred (abs arg) 'restricted)
(treesit-beginning-of-thing pred (abs arg) 'restricted))
;; If we couldn't move, we should signal an error and report
diff --git a/lisp/emacs-lisp/lisp.el b/lisp/emacs-lisp/lisp.el
index c57b1357f63..4c0f720b1a0 100644
--- a/lisp/emacs-lisp/lisp.el
+++ b/lisp/emacs-lisp/lisp.el
@@ -45,7 +45,12 @@ parens-require-spaces
:type 'boolean
:group 'lisp)
-(defvar forward-sexp-function nil
+(defun forward-sexp-default-function (&optional arg)
+ "Default function for `forward-sexp-function'."
+ (goto-char (or (scan-sexps (point) arg) (buffer-end arg)))
+ (if (< arg 0) (backward-prefix-chars)))
+
+(defvar forward-sexp-function #'forward-sexp-default-function
;; FIXME:
;; - for some uses, we may want a "sexp-only" version, which only
;; jumps over a well-formed sexp, rather than some dwimish thing
@@ -74,10 +79,9 @@ forward-sexp
"No next sexp"
"No previous sexp"))))
(or arg (setq arg 1))
- (if forward-sexp-function
- (funcall forward-sexp-function arg)
- (goto-char (or (scan-sexps (point) arg) (buffer-end arg)))
- (if (< arg 0) (backward-prefix-chars)))))
+ (funcall (or forward-sexp-function
+ #'forward-sexp-default-function)
+ arg)))
(defun backward-sexp (&optional arg interactive)
"Move backward across one balanced expression (sexp).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Sun, 14 Apr 2024 16:24:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 68993 <at> debbugs.gnu.org (full text, mbox):
close 68993 30.0.50
thanks
>>> My suggestion would be for both treesit-forward-sentence and -sexp to
>>> use “text” for their heuristic. If someone wants more customized
>>> behavior, they can always write a custom forward-sentence/sexp function.
>>
>> Thanks for the suggestion. So here is the patch that implements this:
>
> LGTM
Thanks, so now pushed and closed.
bug marked as fixed in version 30.0.50, send any further explanations to
68993 <at> debbugs.gnu.org and Juri Linkov <juri <at> linkov.net>
Request was from
Juri Linkov <juri <at> linkov.net>
to
control <at> debbugs.gnu.org
.
(Sun, 14 Apr 2024 16:24:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Tue, 16 Apr 2024 10:16:03 GMT)
Full text and
rfc822 format available.
Message #28 received at 68993 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Looks like 568c174135 broke lisp-tests and thingatpt-tests:
[lisp-tests.log (application/octet-stream, attachment)]
[thingatpt-tests.log (application/octet-stream, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#68993
; Package
emacs
.
(Wed, 17 Apr 2024 06:56:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 68993 <at> debbugs.gnu.org (full text, mbox):
> Looks like 568c174135 broke lisp-tests and thingatpt-tests:
Thanks, should be fixed now.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 15 May 2024 11:24:12 GMT)
Full text and
rfc822 format available.
This bug report was last modified today.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.