alexpgp: (Barcode)
My old Cyrillic-to-Latin transliteration macro in Word is a clunky old thing that has a tendency—at times and for reasons I have never taken the time to fathom—to extend its operation beyond the chunk of text I had selected to the entire document I was working on, which caused time to be wasted clicking the Undo arrow, etc.

The macro's operation relied on a brute-force approach, i.e., a series of statements—which is to say, sequential execution with no control structures—that effectively said "within the selected text, replace all instances of x with y," where each successive statement had the effect of cycling x and y through the upper- and lower-case alphabet. Not surprisingly, when the text selected was sizeable (or when the macro decided to operate on the entire document) code execution was s-l-o-w.

However, I only really needed to run the macro once in a blue moon, so I left it alone, warts and all.

Today, it occurred to me that not only will I be using a different transliteration system for the journal assignment at hand, but I'm going to need to transliterate a lot more often in the course of the job. It sure would be nice, thought I to myself, if I had a macro that ran quickly, where I didn't have to worry about inadvertently transliterating my entire source document.

So, I sat down with Google to do a little "code research" and eventually figured out what to do. First, I learned how to implement a Dictionary object in VBA, which basically creates a number of key–value pairs between Cyrillic and Latin characters. The code for that is pretty straightforward:
Dim chardict
Set chardict = CreateObject("Scripting.Dictionary")
chardict.Add "А", "A"
chardict.Add "Б", "B"
chardict.Add "В", "V"
chardict.Add "Г", "G"
chardict.Add "Д", "D"
chardict.Add "Е", "E"
...
and so on for upper and lower case (and for a few non-alphabetic characters, like the guillemets used in Russian as quotation marks). Now, calling chardict.Item("Б"), for example, returns the letter "B".

The next step was to figure out the length of what it is that's being transliterated (this part I already knew), and then to set up a for-next loop to cycle through each letter of the text to be processed.

The principal stumbling block that I saw involved "doing the right thing" when non-Cyrillic characters (e.g., spaces, numbers, etc.) were encountered in the string. (I mean, you can't ignore such characters. If you do, "улица 26-ти Бакинских коммисаров" gets transliterated as "ulitsatiBakinskikhKommisarov" instead of "ulitsa 26-ti Bakinskikh Kommisarov".) Expanding the dictionary to include all such characters just seemed too awkward, as it was entirely likely that I would not be able to cover every eventuality, as in the case of text with embedded Greek letters.

I looked long and hard at various ways to implement error recovery code, but as it turned out, attempting to fetch a value for a key that doesn't exist apparently doesn't raise an error, and so the solution I found involved comparing the "result" string before and after the attempt to tack a transliterated character to the end of my result. If the before and after strings are identical, that means the code attempted to fetch the value for a key that's not in my dictionary (i.e., a non-Cyrillic character), at which point the code just copies the character, whatever it was.

Everything works surprisingly well. So far.

The workhorse part of the code (where MyData is an instance of DataObject):
MyData.GetFromClipboard  ' copies text from clipboard
s = MyData.GetText       ' s gets the text
length = Len(s)          ' determine length of s
t = ""                   ' initialize t to zero-length string
For Index = 1 To length
   tb = t                ' save a copy of t before adding next char
   ' tack the character transliteration (key value) to the end of t
   t = t & chardict.Item(Mid(s, Index, 1))
   ' if t hasn't changed, just tack _the character_ onto t
   If tb = t Then
      t = t & Mid(s, Index, 1)
   End If
Next Index
Selection.TypeText (t)   ' put the result into the document

Profile

alexpgp: (Default)
alexpgp

January 2018

S M T W T F S
  1 2 3456
7 8910111213
14 15 16 17181920
21222324252627
28293031   

Syndicate

RSS Atom

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 7th, 2025 03:18 am
Powered by Dreamwidth Studios