alexpgp | Sep. 27th, 2000

As I dropped off to sleep last night, I started to dream about the various ways one could go about searching a series of LiveJournal posts. I recalled one approach in a book I read that suggested indexing the content in such a way that only relatively unique words enter into the index, while common words are excluded. I did a pass early this morning though the content of my posts and found that about half of the "words" in my posts are unique, and something like 70% occur 6 times or less. Another approach involves a database (e.g., MySQL), but I don't really have a lot of time to devote to come up to speed in that area right now.

I eventually dismissed everything except the following approach.

First, write a Perl script that reformats a LiveJournal download file into a file that contains all information regarding a particular post on a single line. Second, write a CGI script that basically accepts a query from a Web page and searches each line for the query, outputting an appropriate line every time the content contains the query string. Of course, you've got to write the HTML page containing the form, but that's old hat by now.

So when I was asked today to review a document that's used to construct a Russian version of the 3A mission timeline, my mind naturally turned to Perl to find a solution.

At issue with this document is the fact that the timeline is a very compact document, as it summarizes what each member of the crew is doing at any given time. Some activities take a long time (for example, sleep), and thus, there is a lot of room on the page to place a description. No problem, there.

It is the activities that have 5 or 10 minutes allocated to them that are so very challenging to label. In such cases, there are only 5 or 6 characters available for the description (and spaces count as characters), and abbreviations tend to look a bit unnatural squinched in like that (knw wt I mn, jlybn?). As we've done several passes through the timeline at the client's site over the past few months, so there's also a chance that some of these ad hoc abbreviations are inconsistent among themselves, as well. (That's why we're checking them now! :^)

With Perl running through my mind most of the morning, it turned out to be pretty easy to whip together the necessary code to do the CGI script (I wrote a pseudocode shell during lunch). The challenge was to design the code that takes what you get by downloading your journal and outputs one post per line. That took the rest of my lunch hour and I finished it at home.

The whole structure is still a little wobbly, particularly in the formatting department. Also, since the data file is the result of filtering the download file with a Perl script, the "freshness" of the data being sorted through depends on how often I download my LiveJournal, run the filter, and upload the result to my server. For now, I plan to do updates once per week, which ought to suffice. Finally, one of my design decisions was to have the search ignore anything but upper- and lower-case letters in the search, so I don't have to worry about numbers, punctuation, hyphenation, etc. (Actually, the last step was to insert a link to the search page on my LiveJournal, and that's done, too. It'll do, for now.)

The rest of the day at work passed in a blur. I recall holding an all-hands meeting to congratulate the gang that worked on a last-minute rush that arrived on Friday (and which was delivered on Sunday afternoon), and agreeing to audit a teleconference tomorrow morning, but for the most part, I had my head down at my computer all day. Looks like it'll be catch-up time tomorrow.

Cheers...

The Libertarian Party ticket is headed by a fellow named Harry Browne, and unless you've been exposed to libertarian ideas, it's likely you've never heard of him or the party. From what I am able to figure, I guess the mainstream media want to keep it that way.

Tim Russert, who does Meet the Press, apparently decided to feature Pat Buchanan and Ralph Nader on his show this coming weekend - he said last weekend it would be a "debate" - despite the fact that even with virtually no media attention, Browne leads Buchanan in a number of states, and is making a creditable showing against Nader, too.

A recent Libertarian Party press release notes that:

Harry Browne leads the third party pack in Georgia with 4%. Buchanan trails at 1%, and Nader isn't even on the ballot.
Harry Browne is tied with Ralph Nader in Illinois at 3%, and leads Buchanan by two points.
In Colorado, Nader leads, but Browne is close behind with 3%, while Buchanan polls only 1%.
In Kansas, Nader and Browne are tied, and Browne leads Buchanan 2 to 1.

And this is despite the fact that Buchanan gets 60 times the coverage that Browne gets (according to Lexis-Nexus), and Nader gets more than Buchanan.

I guess what gets me about this is that the media - which is supposed to be reporting the process - is instead actively shaping it and taking sides. In this case, they've decided to take someone who is outpolling Pat Buchanan pretty consistently and they are lumping him in with the other 200+ people who are on the ballot in various jurisdictions, running for the office of President.

If anything, it should be the other way around. Buchanan is clearly off in the fringe, while a debate between Browne and Nader would definitely bring up some issues for discussion.

Scroom. Time for sleep.

Cheers...

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

AlexPGP's Corner

Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.

Sep. 27th, 2000

Sep. 27th, 2000

The neverending search...

Every Pat, Ralph, and ... Harry?

Profile

January 2018

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags