Category: Computers

  • COBOL

    From Tim Bray tonight comes this amazing fact:

    There are five billion new lines of COBOL getting created every year, and there are (wait for it) 220 billion lines of COBOL in production. (Holy cow, now that I think about it, I bet I wrote ten or twenty thousand of them).

  • Blog bot roundup

    The variety is amazing: here’s a list of various agents, spiders and bots that I’ve culled from my chuggnutt.com logfiles over the last 30 days that have to do with RSS and/or blogs (specifically blogs, not just general purpose spiders like Google’s). These are only the ones I know for sure are blog or RSS related; others in my logs might be also, but aren’t obvious about it.

    Geek types, note that these strings (with wildcards mostly) can be used as-is when identifying HTTP_USER_AGENT.

    • Bloglines: The web-based feed reader/aggregator
    • kinjabot: The (currently) beta bot for the Kinja weblog directory/guide
    • Feedreader: Windows-based feed reader/aggregator
    • PubSub.com RSS reader: Another searchable, web-based aggregator
    • FeedDemon: Windows-based feed reader/aggregator
    • fastbuzz.com: Fastbuzz News is another web-based aggregator that scans news and blogs
    • ORblogs.com-bot and ORblogs-bot: The crawlers for ORBlogs which compile metadata and RSS for the aggregating site
    • SharpReader: Windows-based feed reader/aggregator
    • Technoratibot: Technorati‘s crawler
    • UniversalFeedParser: Mark Pilgrim‘s liberal feed parser which is used in a variety of RSS software
    • Feedster Crawler: Feedster’s RSS spider
    • BlogBot: I think this is Blogdex‘s crawler, but I’m not totally sure
    • BlogPulse: Yet another blog/RSS crawler and indexer
    • Slower, Friendlier Spiders (BlogShares V1.35): The spider for BlogShares, the fantasy share market for blogs
    • NITLE Blog Spider: The National Institute for Technology and Liberal Education‘s spider for their blog census
    • LocalfeedsPageCrawler
    • NusEyeFeedCrawler
  • Friendster goes PHP

    An item I saw yesterday but forgot to blog about: Friendster goes PHP. Pretty cool.

    Finally on Friday we launched a platform rearchitecture based on loose-coupling, web standards, and a move from JSP (via Tomcat) to PHP. The website doesn’t look much different, but hopefully we can now stop being a byword for unacceptably poky site performance.

    I haven’t had much of a chance yet to use Friendster to see if it truly is faster, so I can’t personally comment on that aspect. And predictably, this is going to bring all sorts of people out of the woodwork arguing over the relative merits of Java/JSP (which was old Friendster) versus PHP… just look at the comments on the link above to see it already happening. And while debate and disagreement can be healthy and productive, how about a quick reality check to everyone:

    PHP is good. Java is good. Both have their merits and disadvantages. Loudly complaining that [Java|PHP] is the only true way and the other is crap is boring and uninformed.

  • Spolsky on the Windows API

    Joel Spolsky on How Microsoft Lost the API War:

    Outside developers, who were never particularly happy with the complexity of Windows development, have defected from the Microsoft platform en-masse and are now developing for the web….

     

    Much as I hate to say it, a huge chunk of developers have long since moved to the web and refuse to move back.

    Good article. I recommend reading all of it, not just my highly selective snippets here.

  • vCard

    I’ve been playing with the vCard format for a project at work and I gotta say, there’s a technology that’s begging to be re-implemented in XML. I mean, here’s the behind-the-scenes formatting of a vCard file:

    BEGIN:VCARD
    FN:Mr. John Q. Public, Esq.
    N:Public;John;Quinlan;Mr.;Esq.
    BDAY:1995-04-15
    ADR;DOM;HOME:P.O. Box 101;Suite 101;123 Main Street;Any Town;CA;91921-1234;
    TEL;PREF;WORK;MSG;FAX:+1-800-555-1234
    END:VCARD

    …with a bunch of arcane rules for delimiters and encoding. Uh, hello? EDI? 1989 called, and it wants its format back.

    Wouldn’t something like this XML mockup of the same thing just make more sense?

    <vCard>
      <name>
        <family>Public</family>
        <given>John</given>
        <additional>Quinlan</additional>
        <prefix>Mr.</prefix>
        <suffix>Esq.</suffix>
        <formatted>Mr. John Q. Public, Esq.</formatted>
      </name>
      <dob>1995-04-15</dob>
      <address>
        <type>Domestic, Home</type>
        <po>P.O. Box 101</po>
        <extended>Suite 101</extended>
        <street>123 Main Street</street>
        <locality>Any Town</locality>
        <region>CA</region>
        <postalCode>91921-1234</postalCode>
      </address>
      <telephone>
        <preferred />
        <type>Work, Message, Fax</type>
        <number>+1-800-555-1234</number>
      </telephone>
    </vCard>
  • Useless lists: Computer stuff

    A co-worker who’s moving was telling me today about finding a dusty box in his attic that turned out to be an original Atari 2600, and for some reason that made me want to blog about it. Instead, this turned into a list of all the various computer and video game systems I own that I’ve accumulated through the years—all in the spirit of blogging useless lists (like I did the other day with the books left on my bookshelf).

    It’s pretty geeky. And probably a little sad. Reading over the list, it highlights that I’m often behind the times when it comes to hardware. I’m retro-geeky. Read on if you dare.

    (more…)

  • overLIB

    Pointer to a totally excellent JavaScript library for creating popups: overLIB. I’ve been using it the last few days to put together a dynamic drop-down menu for a Web project at work. And I’ve used it before to create popup context menus and tooltips. It’s simply one of the best JavaScript tools out there that I’ve come across—it’s clever, simple to use, and it just works, period.

  • Imperfect end to an imperfect week

    I couldn’t even get myself to post yesterday, I was just done. This last week was the shit week for computer troubles. After spending the first half of the week struggling over my wife’s computer, and Thursday reformatting and reinstalling Windows on a coworker’s computer, Friday was the kicker.

    The hard drive in the boss’s computer at work died. Yeah, the Boss. I get to work Friday morning, find a note on my desk: “Computer says ‘Disk boot failure, insert system disk’ since last night.” Ohhhhhh, how I hoped the problem was simply that there was a disk in the floppy drive.

    There wasn’t.

    Nope. Machine won’t boot; hard disk clicks when it has power. That’s never a good sign. Can’t usefully boot to the floppy; the bootable floppy disk I have is for Windows 98 (yes, almost all of the computers in the office are still running Windows 98), and this is a newer eMachine running Windows XP, so the Win98 boot disk can’t recognize the NTFS partition. Contemplate for a moment running the restore CD, but that will wipe out all the data on the drive, and that can’t happen.

    Of course, like all good, responsible IT persons, I make sure any critical work and files in the office are on the network, right? Right. And the network data is backed up to tape every night, right? Right. So, there really should be no problem, right? Just restore Windows XP (though it’s a bad drive, remember, and really should be replaced), and all the data is safe, right? Well, almost.

    Friggin’ Microsoft Outlook stores all of its data—emails, contacts, events—in a single .PST file on the local machine, not on the network. Uh-oh. And for the Boss, email is the lifeblood of communication in the company; he’ll send out 40-plus emails in any given day. Double uh-oh.

    But no, wait, hold on: like all good, responsible IT persons, I have batch files running on individual workstations that back up the Outlook data files to the network daily, so that they’ll be backed up to the tape each night. This was instituted months ago, after the CFO of the company suffered a major email loss and we identified Outlook as a Major Point of Weakness in the company’s data integrity.

    Whew! Run to the network, open up the appropriate user folder where the Outlook data file should be, check the timestamp on the file.

    Time freezes.

    Somewhere nearby, a cat meows in slow motion. A trillion water molecules in the Deschutes River ricochet off one another in a brilliant cacophany of sound not unlike that of billiard balls on the break. Deep in my brain, a synapse fires and a single drop of sweat languidly rolls down my spine.

    January 30, 2004.

    Not April 1, 2004. January 30. I have never in my life wished more for something to be an April Fool’s Day prank.

    So what happened to my carefully crafted plan of a batch file running at a scheduled time each night?

    The Boss shuts down his computer each night before it can run.

    And that, of course, is the punchline. The rest of my day at work—literally, all but about an hour of it—was spent trying in vain to access the hard drive, just to pull the email from it. No love. A computer place in town that does data recovery was able to see the drive, sort of, but were unable to pull anything from it. The only option left is to shell out up to two grand and have a professional data recovery outfit like Ontrack retrieve the email. I don’t know if we’ll go that route, though.

    By the end of the day, I felt I was about to stroke out. Visions of myself convulsing on the floor seemed oddly appealing. The saving grace of it all is that it was Friday, and the kids were being watched so my wife and I were able to go out to dinner and a movie. We saw “Secret Window,” which was pretty good.

    I’m hoping next week will be better.

  • Some nights I just hate computers…

    God damn the computers are pissing me off tonight. All evening our broadband cable connection has just been running slower than molasses, so it takes forever to accomplish anything online. And then I’m trying to get my wife’s computer fixed up, it’s been running really slow lately and locking up a lot. So I rolled back the Windows ME that was installed on it (have I mentioned before how I hate Windows ME??) to Windows 98, which by and large worked well enough, but now can’t get the blasted TCP/IP to work properly.

    It tells me it’s assigned to some 169.* address, and the DHCP server is “255.255.255.255” (yeah, sure), instead of being sensible and using the perfectly acceptable DHCP server and IP address assignment that has worked with every other computer we’ve had in this house. And the worst part is, I’m sure I’ve encountered this same problem at work, and solved it, but I can’t remember what the solution was. I’ve already tried uninstalling and re-installing TCP/IP, so I don’t know. Maybe it’s just time for the straight low-level format route. Son of a bitch.

  • Search Patch

    While waiting to find out if my hosting provider will change the minimum fulltext word length for MySQL, here’s what I’ve done in the meantime to deal with viable three-character search terms.

    First, I split the search string into the component words (an array). I subtract any stopwords (I’ve got a big list) and for any remaining words that are under four characters long, I add to the SQL query I’m running.

    Here’s the basic form of the query that I’m running, say searching for “porter”:

    SELECT *,
    MATCH(body) AGAINST('porter') AS relevance
    FROM content
    WHERE MATCH(body) AGAINST('porter')
    AND [additional conditions]
    ORDER BY relevance DESC
    LIMIT 10

    This uses fulltext indexing to search for “porter” with weighted relevance, and returns the appropriate content and its relevance score. Pretty straightforward, and it works really well.

    Here’s what the modified query looks like, if there’s short words present, for the search “porter php”:

    SELECT *,
    MATCH(body) AGAINST('porter') +
      (1 / INSTR(body, 'php') + 1 / 2[position of word in string])
    AS relevance
    FROM content
    WHERE ( MATCH(body) AGAINST('porter')
      OR body REGEXP '[^a-zA-Z]php[^a-zA-Z]'
      )
    AND [additional conditions]
    ORDER BY relevance DESC
    LIMIT 10

    Two new things are happening. First, in the WHERE clause, I’m using both the fulltext system to find “porter” and using a regular expression search for “php.” Why REGEXP and not LIKE? Because if I write LIKE '%cow%' for instance, I’ll not only get “cow” but also “coworker” and other wrong matches. A regular expression lets me filter those scenarios out.

    That takes care of finding the words, but I also wanted to tie them into relevance, somehow. The solution I hit upon in the above SQL is relatively simple, and does the trick well enough for my tastes. Basically, the sooner the word appears in the content, the higher its relevance, which is reflected in the inverse of the number of characters “deep” in the content it appears. And I wanted to fudge the number a bit more by weighting the position of the keyword in the search string; the sooner the keyword appears, the higher the relative score it gets.

    It’s not perfect, and I definitely wouldn’t recommend using this method on a sufficiently large dataset, but for my short-term needs it works just fine. The only thing really missing in the relevance factoring is how many times the keyword appeared in the content, but I can live without that for now.