<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:dw="https://www.dreamwidth.org">
  <id>tag:dreamwidth.org,2009-05-04:278019</id>
  <title>AlexPGP's Corner</title>
  <subtitle>Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.</subtitle>
  <author>
    <name>alexpgp</name>
  </author>
  <link rel="alternate" type="text/html" href="https://alexpgp.dreamwidth.org/"/>
  <link rel="self" type="text/xml" href="https://alexpgp.dreamwidth.org/data/atom"/>
  <updated>2017-11-02T14:04:40Z</updated>
  <dw:journal username="alexpgp" type="personal"/>
  <entry>
    <id>tag:dreamwidth.org,2009-05-04:278019:2284618</id>
    <link rel="alternate" type="text/html" href="https://alexpgp.dreamwidth.org/2284618.html"/>
    <link rel="self" type="text/xml" href="https://alexpgp.dreamwidth.org/data/atom/?itemid=2284618"/>
    <title>Wildcards are still my friends...</title>
    <published>2017-11-02T13:50:07Z</published>
    <updated>2017-11-02T14:04:40Z</updated>
    <category term="regular_expression"/>
    <category term="word"/>
    <category term="tip"/>
    <dw:security>public</dw:security>
    <dw:reply-count>0</dw:reply-count>
    <content type="html">Further to a post of mine from &lt;a href="https://alexpgp.dreamwidth.org/1777262.html"&gt;August 2012&lt;/a&gt;...&lt;br /&gt;&lt;br /&gt;Let's say you have an abbreviation (could be a TLA or it could be something else). There is no apparent expansion of the abbreviation in the document. That said, there's nothing to prevent said expansion from appearing &lt;i&gt;somewhere&lt;/i&gt; in the text.&lt;br /&gt;&lt;br /&gt;For each letter of the abbreviation, use the following template in Word (with wildcards enabled):&lt;br /&gt;&lt;br /&gt;&amp;lt;[xX][а-я]*&amp;gt;&lt;br /&gt;&lt;br /&gt;where "&amp;lt; denotes "start of word," "xX" denotes a starting letter, for example, "аА" (&lt;i&gt;both&lt;/i&gt; letters so as to catch words that start with either lower- or upper-case letters&amp;mdash;this is an improvement over the technique from 2012), the "*" at the end denotes "zero or more of the previous character" (another improvement, which will catch single-letter words, though admittedly their appearance in abbreviations is rare), and "&amp;gt;" denotes "end of word."&lt;br /&gt;&lt;br /&gt;So, for example, if you are looking to see if the abbreviation ПКЛ appears, you'd search for:&lt;br /&gt;&lt;br /&gt;&amp;lt;[&amp;lt;пП&amp;gt;][а-я]*&amp;gt; &amp;lt;[&amp;lt;кК&amp;gt;][а-я]*&amp;gt; &amp;lt;[&amp;lt;лЛ][а-я]*&amp;gt;&lt;br /&gt;&lt;br /&gt;As it turns out, ПКЛ appears in a document I am working on (more precisely, in the title of the Russian source file). Using this method, I've determined no three consecutive words start with these three respective letters.&lt;br /&gt;&lt;br /&gt;Back to the face!&lt;br /&gt;&lt;br /&gt;Cheers...&lt;br /&gt;&lt;br /&gt;&lt;img src="https://www.dreamwidth.org/tools/commentcount?user=alexpgp&amp;ditemid=2284618" width="30" height="12" alt="comment count unavailable" style="vertical-align: middle;"/&gt; comments</content>
  </entry>
</feed>
