Chuggnutt!

Category: Online

SharpReader Crashed

Grrr… SharpReader, the news reader I use to read RSS feeds, just crashed on me, and lost all my feeds—data and URLs. After I’d added four of the new Amazon feeds. Shit.

Oh, well. Fortunately, I had a recent backup of the OPML for my feeds, so I was able to get them back quickly.

March 4, 2004
Amazon RSS

Another piece of news everyone pointed to yesterday: Amazon is now offering RSS feeds. A list of all their feeds can be found at the Amazon.com Syndicated Content page. Looks like they’re offering feeds for each top-level category in their hierarchy. The next logical step, of course, would be to offer a personalized RSS feed of your recommendations…

March 4, 2004
Spam Pounder

So the spam problem finally got to be a little overwhelming on our BendCable email account, and we opted in to use BendCable’s anti-spam software/service, Spam Pounder. But here’s the catch: you don’t actually get this anti-spam service on your regular bendcable.com email, no—instead they change your email to a bendbroadband.com address because that’s where they have the actual anti-spam software running. (In order to preserve your bendcable.com address—which you may have had for years, as we have, and don’t want it gone—they set up a forward that shunts everything from your bendcable.com address to the bendbroadband.com one.)

I mean, what the hell is that? Sure changing your email address is a solution for spam, but that’s not the point. I don’t have a lot of confidence in an ISP that can’t even set up spam filtering software on their main mail server, fer chrissakes.

And what the hell is with that name (“Spam Pounder”) and logo?? The images I’m associating with it are not good ones…

Now, having said all that, I will concede that so far it’s doing the job: almost all of the spam is now being caught, I’d give it a 98-99% effectiveness rating so far. The technology seems to work.

But why can’t BendCable integrate this into their main email server like everyone else?

February 26, 2004
Timely Wired Issue

After all the hubbub over Google the last few days, I thought it was pretty interesting when my issue of Wired came today, with “Googlemania!” on the cover. Timely.

February 21, 2004
Is Google Broken?

Elsewhere on this site I’ve stated that I love Google. That still mostly holds true, but there’s been some things about Google lately that are making me pause a bit.

The first concerns Google’s apparent abandonment of RSS for (exclusively) the still-incubating Atom syndication format/API. I won’t bother rehashing the situation here; if you want more details, check out this wonderfully recursive-ironic Google search for “google atom” to get all the gory details. To me this seems like a highly questionable/irresponsible move for Google to make, frankly rather surprising. Hopefully they’ll come to their senses over there.

The other thing deals with their AdWords program. I think it’s broken. Here’s the deal: We’ve been toying with AdWords to run ads on a new project we’re working on, to see how the system worked and if it would be worth it to ramp it up. (Side note: very cool. You can get a nice in-depth look at Google’s internal keyword rankings without ever putting any money down.) Well, it worked for a while, we were very impressed, but then suddenly, over the weekend sometime (I think), it stopped working.

Completely. Our ad never shows up on the exact same searches that it was previously showing up under before. In fact—and here’s the biggest clue that something is seriously broken—as you page through the results, the exact same ads that appeared on the first page of results appears on every subsequent page of results.

WTF?

This did not happen before and should not be happening now. Something is broken. Period. For at least a week. Could it have something to do with Google doubling their index to over 6 billion items (4 billion web pages)? Maybe.

Ideas?

February 19, 2004
Amazon Reviews

One of the big online stories over the past couple of days is Amazon.com‘s weeklong glitch that “suddenly revealed the identities of thousands of people who had anonymously posted book reviews” (New York Times article here). Turns out a lot of what was revealed was that authors were anonymously writing glowing reviews of their own books, and getting family and friends to do so too—and conversely, anonymously panning rivals’ books. This “glitch” exposed a bigger issue:

…many people say Amazon’s pages have turned into what one writer called “a rhetorical war,” where friends and family members are regularly corralled to write glowing reviews and each negative one is scrutinized for the digital fingerprints of known enemies.

Amazon called this “an unfortunate error.” Yeah, right.

Consider: these “anonymous” reviewers are not anonymous at all, Amazon clearly tracks who they really are and can, at any given time, follow exactly who is saying what about any book. Confronted with the questionable antics of these reviewers and the growing “rhetorical war,” I know what I would do to try to put a stop to it. (Here’s a hint: it’s basically the same thing that happened to Amazon.)

February 15, 2004
PHP XML Benchmark

Interesting PHP benchmark of parsing XML showed up on PHP Everywhere. In High Speed XML Parsing is Not Intuitive, John Lim tested five methods of extracting the title element from an XML RSS feed. Surprising results; the regular expression match was by far the fastest, and I would have thought the SAX parsing (based on libxpat, I believe) would have scored significantly faster than the DOM or XPath parsing—but it came in last.

Of course, the regular expression matching in this case was a bit simplistic—typically if you’re going to parse XML files, you’re looking for more than one element. But it’s a good technique to keep in mind.

February 12, 2004
Oregon SWAP

From UtterlyBoring I picked up this link to Oregon SWAP, which looks like an interesting experiment.

SWAP is designed to promote reuse of materials in Central Oregon. It is a free and convenient way for individuals and businesses to exchange reusable or surplus products and prevent them from ending up in the dump.

Looks interesting, although the small Comic Sans font is making my eyes bleed. I also notice they seem to be running PHP for their database search. Booyah!

February 10, 2004
Data Mining the Web

An interesting article today on MSNBC titled “Online search engines lift cover of privacy“, and the “InfoPorn” section of February’s Wired (can’t find a link, sorry) highlighting identity theft motivated me to write about a topic I’ve been thinking about for a while now: data mining the Web.

The article talks about the absurd amount of information that is freely available on the Web, and how much of it is accessible through Google—and then calls using Google to find this data “Google hacking.” I think a more accurate term would be Google mining—there’s really no mad hacker v00d00 ski11z involved, and let’s face it, being able to run a realtime query against a massive database containing billions of pieces of information is really the essence of data mining.

What got me thinking about mining the Web? Most recently, social networking software, and the data such software collects from its users. As I’ve written before, what a useful social networking system will do (among other things) is allow you to crawl the relationships among people and be able to drill-down by varying degrees into their data/life/online platform. But you know, you can already essentially do this with nothing more than a Web browser; it all goes back to the fact that there is an absurd amount of information freely and publicly available on the Web—much of it cheerfully self-published by people who should know better.

Example? Resumes. You’ve all seen them; half the personal sites out there have an online resume page, and you can find at least 45,300 more by searching Google for “resume.doc”. On average, they contain a shocking amount of personal information: what schools you went to, and when; who employed you, and when; your address and phone number; your skills; sometimes your Social Security number. Tip of the iceberg.

You can find out a lot about someone simply by reading their blog. My own is no exception, I’m sure, but sometimes even I’m amazed about how much personal detail people will reveal online.

And did you know you can search for wishlists at Amazon.com and often a user’s wishlist will also contain their birthday and the city and state in which they live? If that doesn’t work, try finding someone’s birthday on Anybirthday.com—they boast having over 130 million entries gleaned from public records.

Here’s where it gets tricky. The MSNBC article takes an alarmist tone, and in part it’s right to do so: companies and people that leave sensitive documents published on a crawler-accessible Web page are in danger of having their privacy violated. However, a lot of the information that’s out there is already public information, or information that’s freely volunteered by people and becomes public. Google is merely a tool that aggregates this information into one source. And me? Hell, I love Google, I frankly think it’s amazing. And I’m an information junkie, I salivate over the data mining possibilities—and I’ve got ideas rolling around my head on what could be done with this data, ways it can be manipulated, and linked, and so on.

We’ve barely scratched the surface when it comes to mining the Web—I think the untapped possibilities we’re sitting on are enormous, potentially dwarfing anything we’ve previously encountered. Google is a first step.

What’s next?

February 9, 2004
Quick linking

Here’s something interesting: my blog entry on Bend WinterFest is now the number 4 result on Google when searching for “Bend winterfest“—less than two weeks after I posted it. Damn, that’s fast.

February 7, 2004