-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit-spell causes noticeable increase in latency #9
Comments
An easy recipe to trigger the behavior is to disable jit-spell-mode and subsequently reenable it, such that it does its work again. Then one should directly feel some latency increase. After jit-spell finished with its work, latency will go down again. Do you also observe this? |
Maybe the slowdown is caused by too many redisplays? It seems you create overlays in the process filter (there is also a TODO about processing chunks). If the process filter is returning in quick succession this may cause many redisplays in quick succession. I had a similar issue in Regarding the chunking - it is not ensured that the process filter receives a complete response btw. This means you have to accumulate the process output and then parse it as far as possible. |
I've tried it and I don't observe any issues. However, I don't use jit-lock stealth mode. Also, I use Hunspell, which is 1 or 2 orders of magnitude slower. This might help Emacs not get overwhelmed. Let me know in case you're able to eliminate some of these hypotheses.
Yes, sure. What happens now is, I wait for the response corresponding to an entire line of the buffer (possibly multiple misspellings). What could be done instead is to process individual misspellings as they arrive (each one produces a line of process output). In light of what you said about too many redisplays, this alternative might not be a great idea. BTW, jit-spell doesn't create too many overlays, and certainly not inside the visible window area (unless you chose the wrong language, which is something you could test too).. So this hypothesis seems rather unlikely to me. What elisp file specifically lets you observe the delay in consult-line? I tested on simple.el, which is reasonably large, but everything seems instantaneous to me. |
Okay, I just tried this: with aspell, I open simple.el and set the language to Portuguese. Then I scroll around, which causes basically every comment and docstring word to be highlight. In this case, aspell is using 95% (of one) CPU and Emacs 12%. I would say aspell just can't respond fast enough to overwhelm Emacs... |
Yes, I also had this - Aspell creating high load. But then I enabled sug-mode=ultra and now Emacs is getting overwhelmed. I also use stealth mode and I've increased the process pipe buffer size. |
Btw did you try the recipe with consult-line? Toggle jit-spell, then immediately run consult-line. There I find the issue to be most pronounced since minibuffer commands with Vertico are resource intensive and you quickly feel increases in latency. The issue is not limited to Vertico however - everything feels slightly delayed during high activity of jit-spell. |
I added a |
Yes, a blank line (not preceded by non-blank lines) just means there are no misspellings in some buffer region. Of course we're going to get a lot of those. This is not special-cased in the process filter, but there's a regexp search that should fail pretty quickly. I've tested now with your setup (aspell + sug-mode=ultra, jit-stealth-time=0.1, 1500 LOC elisp file). I get lots of requests in the mode line counter and eventually high CPU usage, but I really can't see any slowdown in consult-line. Maybe my response time is too slow :-). |
Note also that we are bound to what the |
The CPU usage is due to the Emacs process itself, as I also observed? Is there any way to avoid this? Can we process the requests in chunks somehow such that the process filter is called less often?
Probably your machine is just faster.
I don't understand. Can you please explain more, since I am not familiar with the details. So you send some text and then search for each misspelled word after you got the response? But at least you can search only through the corresponding region of a request and not the whole buffer? I should add that I also increased |
Ahhhh, I feel stupid now. I got what the Any ideas how we could avoid this unfortunate interaction? For There are many interacting moving parts here:
|
Makes sense, but why don't I see this happen? It's enabled by default, I gather. How big is the file you're working on? The process communication of jit-spell is asynchronous and in quite small chunks, so I think it's probably not related. The issue might be with my jit-lock function,
These computations are probably kind of expensive, but should be comparable with all the work font-lock has to do to figure out the code structure, don't you think? |
I have to dig a bit deeper, but I see the high load in
I wouldn't exclude this. As soon as Emacs redisplays or returns to the main loop, process output is handled. If we get many such process events it can make Emacs unresponsive. But the amount of data is small, that's true. (In my asynchronous pipeline of To elaborate - if you are splitting your work in too many small pieces, the overhead becomes exceptionally large in comparison to the actual work. It could be that we have this issue here. Note that the Emacs process output handling machinery is not really efficient. One can quickly saturate Emacs with processes which produce large output or many small pieces of output.
Yes, it should be comparable. |
Yes, you're right about the overhead. I should mention another design detail: I keep this list of request I want to send, but I only send one at a time to the subprocess (for an unrelated reason: if you edit the buffer and all those existing requests become outdated, there would be no way to "unsend" them). This should naturally throttle down the communication. |
Makes sense, but you could still bundle multiple requests together and send them together as one.
In further support of this issue - I notice both high load due to Emacs and Xorg, which means that your process filter somehow causes redisplays (maybe because it creates overlays?). The redisplay add to the processing overhead. |
So adding overlays outside of the visible window range causes redisplay? That would be an issue... |
Why do you even use overlays? Is it not possible to work only with text properties which are more efficient? |
Overlays are the natural tool for this. If I used text properties, I would at least need two code paths for font-lock-mode on and off. |
I assume you could as well use text properties and it would work as well, or even more efficiently. The downside of text properties is that it is more painful to add and remove them.
Are you sure? If you propertize a string with |
Try it: |
I see. But the problem is that font locking overrides the face. If your program runs as part of font locking, this does not happen. |
By the way, you can check if it's the overlay creation that causes the slowdown (presumably through some indirect redisplay effect) with (advice-add 'jit-spell--make-overlay :override #'ignore)
;; and maybe also
(advice-add 'jit-spell--remove-overlays :override #'ignore) Here this doesn't seem to help, but I'm not sure I see exactly the same thing as you do. |
Good idea. I will try this. But I am a bit disillusioned and disappointed regarding spell checking, also given the bad interaction of consult-line with jit-spell. I didn't have these issues before with spell-fu, but I agree with you that it is not ideal conceptually. :( |
Spell-checking doesn't seem like it should be that complicated, right? Anyway, I tried Other than that, I guess the alternatives are:
|
Actually, I'm now reading your original messages again and I'm not sure which points are bugging you:
All 3 things are probably happening; I can notice 2 and I can believe 1, but I don't experience 3. Emacs's IO loop indeed seems rather inefficient and the small processing chunks exacerbate point 2, but at the same time they should mitigate point 3... |
Yes. I feel we have not yet found the ideal approach. Maybe something like spell-fu with integrated inflection would be best. Or maybe implement an external server to offload processing which itself communicates with aspell?
Exactly. This introduces the latency then during the minibuffer filtering. Another idea could be to use run-with-idle-time to ensure that jit-spell becomes completely unobtrusive, at least for regions outside the visible region. |
I think my problem isn't entirely clear, since I didn't clearly nail it down in the beginning. Generally jit-spell introduces significant latency (high Emacs load) while it is running. This becomes most noticeable due to consult-line which enforces font locking and also due to stealth locking. But even without these complications I notice increase in latency. The points which bug me are mostly 2 and 3. consult-line doesn't feel slower to popup - maybe slightly. Note that I usually don't use consult-line in overly large buffers. I already notice the slowdown during filtering for moderately sized buffers, startup is not affected as badly.
Yes, the IO loop is inefficient. One should do everything to reduce the load which is put on it. The small chunks would only help if outside processing is slow, but on the Emacs side every call to tte process filter is horribly costly. |
I just tried the following in (package-initialize)
;; Causes high load until the whole buffer is fontified.
;; All of the settings are overly aggressive.
(setq jit-lock-stealth-time 0.01
jit-lock-stealth-nice 0.01
jit-lock-stealth-load 1000
jit-lock-chunk-size 10000)
;; Reduce load of aspell ~1%. Emacs is high > 90%.
(setq ispell-extra-args '("--sug-mode=ultra"))
(advice-add 'jit-spell--make-overlay :override #'ignore)
(advice-add 'jit-spell--remove-overlays :override #'ignore) I still observe high Emacs load and high Xorg load, but to a slightly lesser extent. The overlays are not the actual problem which causes the high load. Could there be any code in jit-spell with accidentally quadratic behavior, e.g., appending lists to lists in a loop? I will run the profiler to see if I can get some useful information from the recipe in After profiling I found that the culprit is likely the process filter. Even with I am pretty sure now about what I had suspected before - the process filter is called too often with small chunks and causes excessive redisplay as a consequence. We could try the following:
|
I've tried a bunch of things detailed below, none of which gives me too much hope. But let me first remark that I actually use Hunspell, which is much slower than Aspell but actually has good suggestions. If I open xterm.c and do Also, it must be acknowledged that when you call
Version 0.1 actually did a rather extreme version of this; if you call
Here I can think of one optimization, which is to handle "all correct" responses specially. Here's a patch that does that. It seems to help a little, but it's not clear it's worth the extra complexity. diff --git a/jit-spell.el b/jit-spell.el
index 6caa5dc..d430049 100644
--- a/jit-spell.el
+++ b/jit-spell.el
@@ -283,6 +283,18 @@ The process plist includes the following properties:
(defun jit-spell--process-filter (proc string)
"Filter function for jit-spell processes."
(with-current-buffer (process-buffer proc)
+ (if (and (eq (point-max) (point-min))
+ (equal string "\n"))
+ (progn
+ (process-put proc 'jit-spell--current-request nil)
+ (catch 'jit-spell--done
+ (while-let ((request (pop (process-get proc 'jit-spell--requests)))
+ (buffer (car request)))
+ (when (buffer-live-p buffer)
+ (if (not (eq (cadr request) (buffer-chars-modified-tick buffer)))
+ (jit-spell--schedule-pending-checks buffer)
+ (jit-spell--send-request proc request)
+ (throw 'jit-spell--done nil))))))
(save-excursion
(goto-char (point-max))
(insert string))
@@ -327,7 +339,7 @@ The process plist includes the following properties:
(if (not (eq (cadr request) (buffer-chars-modified-tick buffer)))
(jit-spell--schedule-pending-checks buffer)
(jit-spell--send-request proc request)
- (throw 'jit-spell--done nil)))))))))
+ (throw 'jit-spell--done nil))))))))))
(defun jit-spell--send-request (proc request)
"Send REQUEST to ispell process PROC."
This is not relevant to this problem. A request becomes stale if the buffer changed since it was created, and I think you are not editing the buffer while the high CPU usage happens. Note also that a request which is stale when it get to the top of the queue is not sent to the subprocess. (Note also that the more batching is done, the higher the likelihood to get a stale response from the spellcheker.)
Again, what I call pending check in the code is a check to recover from the situation where we get a response from the spellchecker which is no longer applicable because the buffer changed in the meanwhile. This is not relevant to the issue you observe. Lastly, I've also tried this modification: modified jit-spell.el
@@ -334,7 +334,7 @@ The process plist includes the following properties:
(process-put proc 'jit-spell--current-request request)
(pcase-let ((`(,buffer _ ,start ,end) request))
(with-current-buffer buffer
- (let ((text (buffer-substring-no-properties start end)))
+ (let ((text ""))
(process-send-string proc "^")
(process-send-string proc text)
(process-send-string proc "\n"))))) which shows that even sending "^\n" a bunch of times and receiving a bunch of "\n" as response leads to a quite long CPU usage spike. |
I'm not in principle against this suggestion, and since I can't think of more promising alternatives I guess I'll try it, but it does feel like a kludge. (Also, I'm still a bit surprised that the large number of requests creates latency as opposed to just high CPU usage.) Another thought: perhaps |
The load is too high, so we add delays to reduce the load. Seems natural to me. It is only a kludge since in an ideal world one wouldn't expect spell checking to be so expensive and unnecessarily complicated.
I don't know what exactly is causing the issue, but from my understanding your process filter is too inefficient. I don't know how to do it better though. Intuitively the sequential processing doesn't look good, but I am not deep in your code. My explanation for the latency increase arises since the process filter steals time from the main event loop and hangs it up for a while, such that no input events are processed. I've noticed similar issues with small pipe sizes in consult-ripgrep. Therefore I increase the chunk size, see
Sure, but that's an orthogonal issue. For fontification only the threshold seems acceptable like it is. consult-line only demonstrates the performance issue of jit-spell (and also of fontification itself ;). This is how I noticed the latency increase first. But the high load is also created independently of that, even without stealth locking. As soon as jit-spell is processing, Emacs gets noticeable slower for me. Therefore I had to disable jit-spell in my config for now. Frankly I am unsure if the current approach is the final answer to spellchecking. The jit-lock idea is good, but in its current form it is not efficient enough. Do you know how the spellchecker in browser works? They surely use some integrated library - could we steal this library and write an Emacs module? You told me about the inefficiency of the "protocol", where you have to search for the misspelled word. Ideally the spell checker would offer an API to just check words or regions and then respond with character positions. Possible corrections could only be offered on demand. There is no need to send that data right away. |
Btw, I just thought about flymake - I have that always on in my setup and it usually doesn't cause issues. The thing is that the processing happens mostly (or completely) outside of Emacs. It may create a high load externally but Emacs itself stays perfectly responsive since not much data is exchanged. In contrast, with jit-spell there is more back and forth happening between the processes. This again points at the process communication being the bottleneck here. |
Other apps just link against one of the few available spellchecking libraries. This one could be a good option for a module: https://github.com/AbiWord/enchant.
Then making the chunks smaller should make the pauses shorter and eliminate the latency, even if that increases the total CPU usage over time. That's the point that confuses me in this whole story.
I don't think so. Did you try it on Emacs's xterm.c, which is just under the default limit? I use Flymake but not on very large files (I don't think you told me how large are the files that make jit-spell cause latency), and I typically only get a dozen diagnostics at a time as opposed to thousands of misspellings. Flymake isn't even incremental — it always sends and rechecks the entire buffer. |
That's not necessarily correct. If you make the chunks too small it leads to more management overhead, more context switching to the process filter. We are going in circles. I mentioned the redisplay problem before. Every time the process filter runs we may trigger a redisplay. I am not entirely sure how redisplay is triggered, but it seems to occurs more often while jit-spell is running.
I don't understand. I tried large files and there is of course an expected startup delay. If you disagree with the threshold, just decrease it. But this won't change anything about the performance of jit-spell. (Btw, for large buffers or multiple buffers I recommend consult-line-multi which is incremental and doesn't font lock at all.)
I did. Relatively small files already cause issues for me: embark.el, simple.el, init.el, Org notes, ... Note that I did not have any latency issues of this kind with spell-fu. One could maybe glue enchant together with your overall jit-spell architecture, avoiding the process overhead this way (assuming that enchant can quickly check words). Anyway, just tell me if you acknowledge this issue or if you don't. From your responses it looks as if you consider this a wontfix since it doesn't seem to be a problem in your setup (more powerful machine?, hunspell, ..). I don't want to waste our time here. To be clear, I don't expect you to fix this. I would try to fix it myself, but I am also not entirely sure how to tackle this. |
I will try the extra delay idea, when I get time. But naturally I'm less motivated by the fact that I can't see the issue here. A file like embark.el definitely doesn't cause any trouble (on xterm.c at least I get some fan noise). I have an okay machine, but it's a laptop from 2019. |
Okay, thanks. I understand that the motivation is low for issues which are not relevant in your setup. But note that your mode aims to be always enabled in the background. Such modes should satisfy the highest standards.
For me that would be reason enough to improve this. ;) A better alternative could be to really try enchant in combination with jit-locking. Maybe one could get rid of the deferring entirely and just do the fontification immediately if enchant is fast enough. On my system I found libraries like enchant_aspell.so and libenchant.so. The API is quite trivial: int enchant_dict_check (EnchantDict * dict, const char *const word, ssize_t len); But at least chrome doesn't seem to use enchant. They reinvented their own wheel I think. |
I just looked at spell-fu again since I was wondering why it is not causing such a slowdown. The idea there seems to be that pending regions are only checked if they are visible. jit locking is used to collect all pending regions. They are then sent after an idle time but restricted to the visible range. Maybe that could work for jit-spell too? |
Looking at this thread it seems like the problem is that the constant part (i.e the part independent of the size of accepted output) of the runtime cost of I remember seeing a thread on |
@aikrahguzar I am not sure anymore. I looked more closely at the
jit-spell code and also the jit locking code itself seems quite
expensive. I believe that the best bet is to only spell check the
currently visible region, since this will solve all scenarios (stealth
locking and consult-line, no complicated scheduling needed).
During jit locking only pending properties are added, which will be
fast. Then in an idle timer the currently visible window region is
searched for these pending regions, the regions are then restricted
according to faces if needed (prog mode), sent to the backend process,
post processed (ignore faces, flyspell ignore predicate) and then the
overlays are created.
|
@astoff I made a small experiment. The following "spell checker" scans over all words and just highlights the word Maybe this approach could work for jit-spell? I believe that jit-spell suffers from the inefficiency of the process filter and the large amount of work which is created in the first place during jit locking. I know this is quite a fundamental change from your current architecture, but would you accept a PR which implements this approach instead? Did you reject the idea of highlighting only the visible region for some reason? (defcustom spell-experiment-delay 0.1
"Idle timer delay."
:type 'float)
(defface spell-experiment-misspelled
'((((supports :underline (:style wave)))
:underline (:style wave :color "red"))
(t :underline t :inherit error))
"Face used for misspelled words.")
(defvar spell-experiment--timer nil)
(defvar spell-experiment--predicates
(list #'spell-experiment--word-valid-p))
(defun spell-experiment--word-valid-p (start end)
(not (member (buffer-substring-no-properties start end) '("defun"))))
(put 'spell-experiment 'evaporate t)
(put 'spell-experiment 'face 'spell-experiment-misspelled)
(put 'spell-experiment 'modification-hooks (list #'spell-experiment--delete-overlay))
(put 'spell-experiment 'insert-in-front-hooks (list #'spell-experiment--delete-overlay))
(put 'spell-experiment 'insert-behind-hooks (list #'spell-experiment--delete-overlay))
(defun spell-experiment--delete-overlay (ov &rest _)
(delete-overlay ov))
(defun spell-experiment--check-pending ()
(let ((start (window-start))
(end (window-end)))
(with-silent-modifications
(save-excursion
(while-let ((pbeg (text-property-any start end 'spell-experiment--pending t))
(pend (or (text-property-any pbeg end 'spell-experiment--pending nil) end)))
(remove-list-of-text-properties pbeg pend '(spell-experiment--pending))
(goto-char pbeg)
(setq pbeg (pos-bol))
(goto-char pend)
(setq pend (pos-eol))
(spell-experiment--check-region pbeg pend))))))
(defun spell-experiment--check-region (start end)
(save-excursion
(save-match-data
(goto-char start)
(while (re-search-forward "\\w+" end t)
(let ((word-beg (match-beginning 0))
(word-end (match-end 0)))
(unless (run-hook-with-args-until-success
'spell-experiment--predicates word-beg word-end)
(overlay-put (make-overlay word-beg word-end) 'category 'spell-experiment)))))))
(defun spell-experiment--unfontify (&optional start end lax)
(setq start (or start (point-min))
end (or end (point-max)))
(dolist (ov (overlays-in start end))
(when (eq 'spell-experiment (overlay-get ov 'category))
(delete-overlay ov)))
(remove-list-of-text-properties start end '(spell-experiment--pending))
(unless lax
(jit-lock-refontify start end)))
(defun spell-experiment--pending (start end)
(put-text-property start end 'spell-experiment--pending t)
(unless inhibit-quit ;; inhibit-quit is non-nil for stealth locking
(spell-experiment--start-timer)))
(define-minor-mode spell-experiment-mode
"Just-in-time spell checking."
:global nil
(cond
(spell-experiment-mode
(add-hook 'ispell-change-dictionary-hook #'spell-experiment--unfontify nil t)
(add-hook 'window-state-change-hook #'spell-experiment--start-timer nil t)
(add-hook 'window-scroll-functions #'spell-experiment--start-timer nil t)
(jit-lock-register #'spell-experiment--pending))
(t
(remove-hook 'ispell-change-dictionary-hook #'spell-experiment--unfontify t)
(remove-hook 'window-state-change-hook #'spell-experiment--start-timer t)
(remove-hook 'window-scroll-functions #'spell-experiment--start-timer t)
(jit-lock-unregister #'spell-experiment--pending)
(spell-experiment--unfontify nil nil t))))
(defun spell-experiment--timer ()
(setq spell-experiment--timer nil)
(dolist (frame (frame-list))
(dolist (win (window-list frame 'no-miniwindow))
(with-current-buffer (window-buffer win)
(when spell-experiment-mode
(spell-experiment--check-pending))))))
(defun spell-experiment--start-timer (&rest _)
(when (and (not spell-experiment--timer) (get-buffer-window))
(setq spell-experiment--timer (run-with-idle-timer spell-experiment-delay nil #'spell-experiment--timer)))) |
I think relying on these hooks is going to be brittle, see e.g. this about similar problem in eglot and later messages in the thread discussing the unreliability of
|
On 3/23/23 13:18, aikrahguzar wrote:
|window-state-change-hook| covers most of the scenarios in message but
not for example scrolling with |pixel-scroll-precision-mode| enabled. I
think the only thing that is going to work is a buffer local
|post-command-hook| although that feels very heavy handed.
Actually a pch is not a bad idea. It won't be heavy since it is just
used to restart the timer.
|
With a pch and some advice, I think you can have the behavior you want, (advice-add 'jit-spell--check-pending-regions :around
(defun +jit-spell-restrict-to-visible (fun buf)
(when-let ((win (get-buffer-window buf)))
(with-current-buffer buf
(save-restriction
(narrow-to-region (window-start win) (window-end win))
(funcall fun buf))))))
(advice-add 'jit-spell--filter-region :around
(defun +jit-spell-filter-region (fun beg end)
(when (or (pos-visible-in-window-p beg)
(pos-visible-in-window-p end))
(funcall fun beg end))))
(defun +jit-spell-schedule-checks () (jit-spell--schedule-pending-checks (current-buffer)))
(add-hook 'jit-spell-mode-hook
(defun +jit-spell-scheduler ()
(if jit-spell-mode
(add-hook 'post-command-hook #'+jit-spell-schedule-checks nil t)
(remove-hook 'post-command-hook #'+jit-spell-schedule-checks nil)))) This works for me and I think should function as you want. |
@aikrahguzar I am not sure if this is correct, since filter region will then exclude other areas which are then not rescheduled again? Anyway, I am not interested in hacking jit-spell via advices. I think the default behavior of it should be improved such that it performs better. |
Font lock does a lot more work than that inside the jit-lock-function. What are you attributing this performance issue to? The use of overlays instead of text properties? (By the way, the overlay machine has been overhauled in Emacs 29, are you using it?) But I also don't understand what this experiment is supposed to teach us, since jit-spell doesn't do anything comparable in its own jit-lock function. Clearly the problem is in the "async" part of the story, and judging by #9 (comment), it's in good measure due to some internal inefficiency of the Emacs IO loop. (By the way, to respond a question I missed in the linked comment: no, jit-spell doesn't manipulate lists with quadratic behavior, why would I do that?) Another comment, dabbling with window bounds is tricky and highly unreliable, and also nullifies one of jit-lock's benefits which consist in fontifying the buffer a little ahead of time, so that you will never see unfontified regions if when performing most typical scrolling patterns. |
The idea is that it is better to avoid most work in the first place, by not sending unnecessary requests. spell-fu uses a similar technique, where only visible regions are considered. To emphasize that again - I never had performance issues with spell-fu. Anyway let's close this. |
It might be that the parsing of the corrections takes quite a bit of time. This would be the patch that disables it. modified jit-spell.el
@@ -356,10 +356,6 @@ The process plist includes the following properties:
(let ((word (match-string 1))
(start (string-to-number (match-string 2)))
corrections)
- (goto-char (match-end 0))
- (while (re-search-forward (rx (+ (not (any ", \n"))))
- (pos-eol) t)
- (push (match-string 0) corrections))
(push (list word start (nreverse corrections))
misspellings)))
(jit-spell--make-overlays buffer start end misspellings)))) To get an insight on the effect of just calling a process filter too often (since it presumably causes redisplay) one could try this fake spell checker: #!/bin/bash
echo '@(#) International Ispell 0.0.1'
if [[ $1 == '-v'* ]]; then exit 0; fi
if [[ $1 == 'dicts' ]]; then echo "en_US"; exit 0; fi
while true; do
IFS='' read line
if [[ $line == [^*@#~\+\-\!%]* ]]; then echo ; fi
done And again, which Emacs version are you using? I'm using 29, and it's supposed to have improvements in the overlay machinery. |
Yes, I am using Emacs 29. I decided to explore an alternative approach now. I directly bind libenchant and use deferred highlighting of the visible region only. See https://github.com/minad/jinx. By not relying on the Ispell API one can get rid of quite a bit of complexity, but one trades this with native bindings in return (maybe less portable and harder to install). The resource usage of this solution is quite low. Above you made the point that deferring the font locking nullifies the benefits of jit-lock. From my experience this is not really the case since you have to wait for the spell checker backend anyway. By always deferring the highlights, the result feels even more consistent. I also tried two different delays (one short for scrolling and one a little bit longer for pch), which ensures that highlights appear quickly when scrolling, but this doesn't even seem necessary for a good UI. |
That looks cool. If you want to merge it with this package as an optional mode, I'd certainly be interested. But then we should actually bypass all the complexity and do the checks synchronously during jit-lock. Try Also, looking briefly it seems you implemented a function to check one word. It's much better to have a function to check a region and let the spell checker tokenize it. You don't want to dabble into the question of what constitutes a word in a language; you will always get it a bit wrong. You also don't need to maintain a list of ignored regexps: just type "[email protected]" into (You also shouldn't need to maintain a list of mode-specific ignored faces. Emacs has an API for that already, which is totally usable even if imperfect.) |
Yes, we could consider that. But I am afraid that the models may be too different.
I doubt that this will work nicely with stealth locking etc. The problem is probably not the spell checker itself but all the work which is produced on the way - checking the predicates, allocating strings, creating overlays and so on. Note that spell-fu also offers both an immediate and a delayed mode and it does the cheapest thing possible - looking up the words in a hash table.
EDIT: It offers functions to check characters if they are part of word. So this function should be used.
Good points. Currently I use
Indeed. I think it makes sense to still offer the option to ignore regexps, but you are right that it would be better if the spell checker does this automatically for us. The requirement is only that we tokenize correctly.
I disagree with that. The face check can be much more efficient than some of the flyspell predicates. Also note that flyspell itself comes with mode-specific predicates preconfigured. |
Okay, so I experimented with the backend tokenizer. It does not seem to be worth the complexity. I would argue that Emacs probably has sufficient understanding of word boundaries. Also the point you made regarding regexps doesn't hold for Enchant. If I run |
Tokenization must go hand in hand with how the dictionary was designed. If your API provides a "check word" function, then it must at least announce what are the word characters of that dictionary (or else you have to maintain a mapping of dictionaries to word characters in your package). But this is not a good API anyway. While it might work okay 99% of the time, notion of "word characters" is just a gross approximation of reality. Take the hyphen as an example. You can join words with hyphens more or less at will in most languages (or at least you can in all languages I know), so you would conclude that the hyphen must not be a word character. But there's also the case of prefixes which are not actual words by themselves, such as non-. There are also fixed expressions like Austro-Hungarian; AFAIK "Austro" is not a word that could be used in any other context. Portuguese has contractions like "construí-lo" where neither of the pieces exists as word by itself. This remark might be a bit academic in the sense that all implementations seem to get sufficiently many details wrong anyway. But if the issue is consecrated in the API, then there's not even a chance things can be refined over time. [Note also this related issue of naive tokenization. I reluctantly implemented a workaround for this in jit-spell because Hunspell seems to be reasonably good apart from this.] |
I agree with you in theory, however this is not how reality looks. I looked into the various spell checkers and they all use rather primitive tokenizers. Look at the triviality here, where only extra word characters are taken into account: This can be replicated easily directly in Emacs. From my experience it seems easiest to perform the tokenization and region checking in Emacs itself. Also Emacs has the necessary information about the buffer mode and the syntax table. Note that I actually implemented the tokenizer by using Enchant directly (https://github.com/minad/jinx/blob/f2c6120355145f092bd4123ac0d78ef8939b4692/jinx-mod.c#L86-L115). Turns out, this approach is complicated and offers zero benefits. The "tokenizer" I use right now in jinx.el is of course overly naive. I could make it more correct by querying the extra word characters from the backend, or add some additional apostrophe hacks like you did. But then I would just introduce a discrepancy with the Emacs language support itself (word characters and boundaries). I rather live happily with the 99%.
Iirc I have seen a statement somewhere on the Hunspell tracker along the lines that tokenization itself is not considered a "core capability" of the spell checker. It is just needed to create a useful CLI tool. Besides that file format parsing and tokenization is better left to the frontend. |
Well, Hunspell has the capability I described, even if they don't call it tokenization. I can show you an example:
The above shows I can agglomerate random words with a hyphen.
And the above shows that it knows about fixed expressions that only exist in a hyphenated form. (Sadly the pt_BR dictionary seems to be implemented in a way that doesn't allow this.)
This definitely cannot work. The spellchecker tokenization must be based on the selected human language, not the programming language. For pt_PT you want to include BTW, in tex-mode |
They call it parsing, but there is a function
Sure, I wasn't necessarily talking about programming languages but about text written in a text buffer. In programming language buffers you are of course right that we have a problem with mixed languages, but I am strongly in favor of sticking to English in programming mode buffers, also for comments.
Yes, I get false positives there. But it would be easy to copy that logic to the Emacs "tokenizer". Overall the tokenizers are quite simple in both enchant_hunspell.cpp and hunspell/textparser.cxx. My point is that it is beneficial to do the tokenization in Emacs itself, since this allows us to extend and hack the pipeline. Otherwise one ends up with interleaved tokenization partially in the spell-checker and partially within Emacs, which seems more complicated to extend. EDIT: I fixed the tokenization in jinx, such that the word characters provided by the backend are taken into account. Furthermore single quotes at the end of words are treated better. |
I observe an increase in latency when opening a new file (prog mode, e.g., elisp). Then jit-spell starts up and I invoke consult-line immediately. Then filtering feels much slower than without jit-spell.
Unfortunately I cannot obtain a realistic profile - the jit-spell process filter doesn't appear prominently. However in top Emacs is taking much more resources with jit-spell enabled than without in this scenario. Without jit-spell Emacs creates about 15% load and with jit-spell enabled it creates 90% load. Something about jit-spell is inefficient. Maybe one really needs some additional delay such that the Emacs process filter becomes less costly.
(EDIT: See #1 (comment), where I initially mentioned that aspell appears prominently in top. This is solved by sug-mode=ultra, but Emacs itself is still too heavily loaded by jit-spell.)
The text was updated successfully, but these errors were encountered: