alexpgp: (Barcode)
[personal profile] alexpgp
Org-mode has this interesting capability via something called org-protocol to implement interoperation between a browser and emacs. The upshot of this is being able click on a browser bookmark and have the URL of the page you're looking at appended to an Org file, and even better, to highlight something on a page, click a browser bookmark, and have the URL and the highlighted text appended to an Org file (what I call a "clipping file").

There's a pretty good explanation of how to set all this up at Worg, which is an Org knowledge base maintained by the Org community. (It's sort of like a wiki, except that pages are edited in Org mode using emacs and then exported to HTML.)

Anyway, getting the setup to work didn't take all that much time (at least not after learning that emacs had to be running in server mode). Getting it to work with Cyrillic text still doesn't work, but at least I think I have a handle on what's going on.

The basic problem revolves around emacs complaining bitterly that the characters being added to the target buffer don't play well with the clipping file buffer's character's encoding, and would I please fix that?

At first, I really didn't understand what was going on, but after a little poking around, I found that the clipping file, which had started out encoded as utf-8, had been saved with ansi encoding. I also found that, if I selected the value of "raw text" when asked how to render the text being added, the result was a series of characters encoded in octal (e.g., a value of 224 was shown as \340).

As it turns out, that particular byte value corresponds to the Cyrillic letter 'а' (lower case), and after a little comparing, it turns out that the call to the Javascript window.getSelection() function, which is responsible for coughing up the text selected in the browser window, returns text encoded in good old code page 1251, which was devised by Microsoft aeons ago to deal with Cyrillic (the fact that the page's source specifies utf-8 as the page encoding is apparently not relevant here).

So, in between this and that, I'm now trying to figure out how I might modify some Lisp code to convert cp-1251 values to utf-8.

Step by step...

Profile

alexpgp: (Default)
alexpgp

January 2018

S M T W T F S
  1 2 3456
7 8910111213
14 15 16 17181920
21222324252627
28293031   

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 9th, 2026 11:13 am
Powered by Dreamwidth Studios