Chuggnutt!

Blog

Flu Season

Been slow to post anything the last couple of days, mostly because we have the flu in the house, and we’ve been nursing sick kids. It’s mostly run its course, but turned into an ear infection in our oldest, and we got antibiotics for that today.

And no, we didn’t get flu shots. I’ve never bothered to get a flu shot, and have never gotten the flu. If I caught it from the kids (hey, there’s a first for everything), it turned into a head cold that’s pretty much gone. Of course, I rarely get sick as it is, so maybe I just have an iron-clad immune system.

December 15, 2003
Content Management: Spokane Database Schema

As promised, here’s my proposed database schema (using MySQL) for my Spokane Personal Publishing System. It’s long and technical, read on at your own risk. (more…)

December 12, 2003
TrackBack?
Jeremy Zawodny had a post imagining a corporate worst-case scenario involving that ubiquitous Movable Type-developed technology, TrackBack. I’d been musing over TrackBack for awhile, and two things yesterday got me looking deeper into it: Zawodny’s blog entry, and the link to my site from Ensight that I detailed in my previous entry.

I’ll admit, before yesterday what I knew about TrackBack was fairly minimal: it was a way to let sites know when other sites were linking to them (by sites, I suppose it should be clarified I mean blogs)—which to me is basically the equivalent of scanning the webserver’s referrer logs. Hence, I’ve more-or-less ignored implementing it in my own software.

I’m rethinking that decision now, largely because of the Ensight link. You know how I found that link to me? Technorati. (I would’ve seen it in the Apache logs, sooner or later, but I’ve been behind on those lately.) It occurred to me, though, that if I hadn’t checked Technorati, or if the post containing the link to me had scrolled off of Ensight’s front page and off Technorati, then I might never have known that I had been linked to.

TrackBack might change that. I say “might” because I’m still on the fence, as far as it goes. I can’t deny that if I had a TrackBack implementation in place, I would have gotten a notification of linkage in this case—Ensight runs Movable Type, which of course runs TrackBack. So I looked into the TrackBack specs yesterday to educate myself.

Here’s my official “from the fence” opinion:

TrackBack is a rather ugly kludge, albeit somewhat clever.

It has its good points, and its bad points. Here’s the good points:
- The concept. It’s good, I admit it. However, it took a close reading of the technical spec to get it across to me. The most important thing about the concept is that it can transcend the weblog world; done right, this could be a powerful tool for all sorts of Web applications.
- It uses plain-vanilla HTTP calls to ping other sites. Simple, easy to implement, firewall-friendly.
- The autodiscovery concept—having your client try to automagically retrieve and ping a site based on the link you give it is neat.
- Adoption. Almost all Movable Type and TypePad blogs I’ve seen use it, and a good number of other blog tools use it too. It’s got the inertia.
Now, the bad:
- It’s too vague and confusing. Prior to yesterday, I only had an inkling of how it worked and what it did, and I’m pretty savvy at this stuff; I just couldn’t grok what exactly was going on when viewing sites that use it.
- Related to the previous point, the name itself doesn’t work for me, it makes me want to only look in one direction for links (back) while the spec several times emphasizes it’s a peer-to-peer technology (ie., two-way). Too much confusion and vague imagery doesn’t breed a good market presence.
- The execution leaves me a bit cold. That’s tough to quantify, I know, but it just seems to me to be too Movable Type-centric, and hence too limited to be the real-world peer-to-peer communication framework it wants to be.
- The autodiscovery solution, while clever, is an ugly hack: embedding RDF into the HTML of a page? Worse, having to surround it with HTML comment tags to avoid breakage? Ick, ick, ick. Seems to me a better solution would have been to embed the autodiscovery stuff in HTML meta tags, like the RSS autodiscovery link you’ll find in many sites (including my own). Even something simple along these lines, like:<meta name="trackback" content="http://www.example.com/tb.cgi?id=1">
  would do. And it would play nicely. I’ve noticed more than once that sites with that embedded RDF cause script errors in my browser.
So while TrackBack, conceptually, is good, its execution is kludgy and ugly. Because of this, I probably wouldn’t give serious consideration to implementing it on my site… except for the fact that it’s being highly adopted, and as a community-building tool it’s better than nothing at all. Do I want to miss the boat? I don’t know, yet.

Other thoughts? What do you all think? Is TrackBack good enough? Or could it be better?
December 10, 2003
A Little Ensight

Jeremy Wright over on Ensight has wrote up some good commentary to my Thoughts on Content Management post from a few days back. He’s hit on the exact points that prompted me to explore this topic: “most CMS’s are piss poorly designed” (which is exactly right; most are piss-poorly designed, I’m just as guilty of this as anybody), and “there is no need to choose how you are managing your content until it is actually time to manage it.” (Emphasis mine.) Right on.

And, here’s some kudos from Jeremy that caught me entirely off-guard:

Jon, over at Chuggnut.com, is one of my favourite writers. Balanced, fair and most importantly, intelligent.

Wow. That’s a damn nice thing to say, Jeremy—thank you! (To everyone else, sorry for the ego-stroking; I’ll try not to let it go to my head… too much.)

December 9, 2003
Support

Jake over on UtterlyBoring is having some serious back problems and could definitely use some support. So, if you can, donate to Jake, or maybe buy something from his Orty.com store. If things are tight, hey, I understand, just send him some email or link to his site. Every little bit helps!

December 8, 2003
O Tannenbaum

Busy busy weekend. Most of it was holiday-oriented, though, and you can see the fruits of a good part of that in our nice six foot Douglas fir that’s laden with ornaments there on the right. Click for the larger image in all its glory.

And of course, I couldn’t resist including a picture of the prize ornament (gotta click to see!):

December 7, 2003
Search Snafu

This article on Gadgetopia links to my content management post I made yesterday (er, today?) and brings up a drawback to my system that I forgot to include: searching.

Within the relational database world, you can do precise, structured queries against specific fields in your tables. In a properly normalized database, this is all-powerful.

However, when you bundle a bunch of content up in an XML package, and stuff that into a single field, you lose this functionality of doing atomic searches against those fields. In the example I wrote up—a geocaching XML record with latitude, longitude, etc.—there would be no way to this type of query:

SELECT * FROM content WHERE longitute BETWEEN -122.5 AND -120.5;

So, a problem. A big problem, since searching data is a pretty fundamental concept in content management—hell, in any application. I have some ideas that address this, but they’re still percolating. More to come.

December 6, 2003
Thoughts on Content Management
I’ve been thinking a long time about content management systems (which isn’t surprising considering developing various types of website CMSes is what I do for a living), how they pertain to weblogs and similar types of content, how to implement them in PHP and MySQL, and what type of system I would really like to have. Now, content management is a big topic, so let me clarify and narrow down what I’m talking about before I go on.

Some definitions
A piece of content can be anything—a blog entry, a fragment of text, a photo, an MP3 file, a recipe for carrot cake, a Palm Reader ebook, a scrap of a note written on a yellow sticky pad. A lot of what defines and contextualizes the content is the metadata that goes along with it—the date it was created, the size of the file, the author, the image format, where it was created, etc. Now, granted, different types of content can have vastly different types of metadata; for instance, a JPEG image taken with a digital camera will have attributes attached to it describing its resolution, compression quality, file size, camera specs, and date and time it was taken, while a piece of GIS data will have, say, latitude and longitude attributes, elevation, and place name information (which could be any or all of street name, city name, county name, etc.).

Some requirements
After using and extending my own homebrewed blog software for over a year and half, examining other systems like Movable Type, and getting lots of ideas from other blogs and smart folks online, I’ve decided that what I’m thinking about is what I call a Personal Publishing System (PPS?), which could be considered a subset of a CMS. The PPS should have some features of a CMS, but certainly doesn’t need all of them; allowing multiple users to manage content is okay, for instance, but a comprehensive workflow system is unnecessary—just being able to flag a content item as a draft or final version, and perhaps an approval tag, is all that’s needed. Here’s a list of some requirements I’d like to see in my PPS:
- Web based.
- Any type of content and its metadata can be handled.
- Each piece of content has a globally unique identifier (“guid”) of some kind.
- Each piece of content can be access/retrieved via a URL (probably incorporating the guid).
- Content can be published in any format: HTML (browsers), RSS (syndication/aggregators), PDF, etc. etc.
- Content can be categorized based on a hierarchical tree of categories. In fact, content can be assigned to multiple categories.
My general philosophy here is that I want to challenge my own notions about what constitutes a blog and see how far I can take it. Hubris, probably.

Database theory
A well-formed and normalized database would rightly split different types of content into their own properly modeled tables, which is the sane, efficient and right thing to do. I love data normalization, and I take a particular joy in modeling a data structure to a relational database and normalizing the hell out of its elements.

In fact, as any Web application developer using a relational database will tell you, this is critical; the database is one of the biggest bottlenecks in the entire system, and it can be Web suicide for even a moderately-loaded site to have unoptimized tables behind your code.

On the other hand, there is a drawback in trying to run a content management system this way: for every new type of content you want the system to handle, you have to create a new table (or several, depending on how normalized you want to get) and then add code into your system for handling the new table(s). (Okay, astute PHP programmers will realize you could create a master table that contains information and metadata about the new tables, and have PHP code that automagically handles the new tables based on this master table info—so you would only have to create the new tables and the system auto-populates the master table info and knows how to deal with that content in a general way. You wouldn’t have to recode for new additions. I’ve done this. It works reasonably well, considering.) Pretty soon, you’ve got so many tables handling every different case you can think of, that database performance degrades regardless of how optimized each table is. And managing potentially hundreds of tables becomes a nightmare in logistics.

Left field
So of course, in imagining a theoretical structure for my PPS, I went slightly insane and threw this stuff out the window. Here’s the gist of it:

Treat every piece of content as the same as every other, and store it all in a single table. Preposterous? Probably. But bear in mind that there will be a common set of metadata attributes that every piece of content will have (at least in this context): a unique name or identifier (the guid), a date it was created, a title, a description. And of course, there would have to be a “body” field for the content itself. Roll those into the table structure.

What about different types of content—text versus images? Easy—include a MIME type field in the table, that defines the content type—”text/html” or “image/jpeg,” for instance. (You could store the actual binary data of an image in a file somewhere, linked to by the guid stored in the name field.)

Let’s look at this real quick in the context of a MySQL table:
```
   content_id -> Primary key
   name -> varchar (unique key)
   title -> varchar
   description -> text (probably will be >255 characters)
   date_created -> datetime
   mime_type -> varchar (possibly enum?)
   body -> mediumtext (large data sets, up to 16MB)
```
That handles the basic metadata, and could be sufficient for something like a weblog. But what if I want to add some content that has additional metadata that the table doesn’t account for—like a geocaching record, and I want to track latitude and longitude coordinates somewhere? I can’t add more fields to the table—that’s a loser’s game for (I hope) obvious reasons. Once I had settled on the idea of a MIME type field, the answer seemed clear: XML. Bake XML into the database structure as content.

To be clearer: set the MIME type of that piece of content to “text/xml” and the populate the body field with XML data of the content in question, with the extra metadata fields rolled into it as part of its XML definition. So, you might populate the body field with something like:
```
   <content type="geocache">
      <latitude>45.6684776</latitude>
      <longitude>-121.3394771</longitude>
      <dateHidden>2003-12-05</dateHidden>
      <cache type="traditional" name="coffee can">
         <item>Spiral-bound logbook</item>
         <item>Yo-yo</item>
         <item>Deck of cards</item>
      </cache>
   </content>
```
What I like about this idea is its object-oriented analogy: start with a basic definition for content—a “class”—and each instance of content inherits from the base class and, via XML, can extend the base class for itself.

There’s limitations to account for, as well. Not all types of data can be easily shoehorned into this model, so it shouldn’t be attempted. For instance, a voting system: you need a table to store the poll topics, one to store each option/answer, and at least one more for storing user votes. There would be no sense in trying to hack this into the content table, and the system would suffer if it was. So there’s always room for specialized functionality.

And, I’ve modeled some compromises. Rather than trying to manage the category system as just another type of content (so that you’d end up with parent-child content relationships), I pulled the categories out into another table. It’s cleaner and there’s more benefit to the system this way—I can add a many-to-many lookup table to allow for multiple categorization. (Incidentally, in my PPS, I call these channels, because they might fulfill a purpose beyond that of a traditional category system.)

Another compromise is the concept of content nodes. A content node is basically a grouping that content can be classified into—another lookup table. All the content I write for my blog would be assigned to the “chuggnutt.com blog” node, for instance.

Oops, and don’t forget about a commenting system—user comments (and perhaps ratings?) are a valuable source of metadata for any given piece of content. So I’ve allowed for another table to store comments, rather than making them another type of content, because I want to stay away from the parent-child relationship situation I alluded to above.

Will it all work? I don’t know. The proof is in the pudding, though—I’m working to convert my own blog to this system, so I’ll find out firsthand just how good (or bad) my ideas are. I really don’t think this system is viable to run as a large-scale, enterprise-style content management solution—hence the reason I’m calling this a Personal Publishing System. Incidentally, the working name (or code name, if you will) in the back of my mind for this system is “Spokane.”

I’m making this an open process, too, to solicit comments on my ideas, and hopefully to give ideas to any other people out there looking to write their own systems in PHP. To that end, the next article I’ll post on this topic will move from theory to practice, and I’ll publish the MySQL database schema I’ve been developing (with comments). Exciting stuff!
December 6, 2003
NetOffice

Installed NetOffice, PHP project management software, this morning to better manage my various Web projects. Once it’s up and working, it’s a pretty slick piece of software. Had some trouble installing it and getting it to work initially, though.

First, after it’s installed, it prompts to you log in to start using the software—with a username and password. The only password I gave it was an administrator password, and the documentation I had didn’t indicate what the username is to log in with. I correctly guessed the username was “admin,” but then the system wouldn’t let me in, it kept giving me a “Session error” message. I was finally able to make that go away by disabling NetOffice’s custom session management routines and letting the system default to PHP’s native session handling. The files I had to modify for this were includes/library.php, general/login.php, and projects_site/index.php.

Pain in the ass, but that fixed it, and now it works pretty well.

December 5, 2003
BendBuzz

My friend/business partner and I are launching several new websites, and I’m pointing to one here: BendBuzz. He’s taking the lead on this one, but I’m totally behind it. It’s a weblog-type site devoted to Bend and Central Oregon—not necessarily news per se (which Bend.com has pretty well covered), but for anything we can think of. Kind of like Slashdot for Bend. Check it out. It’s brand new, so there’s not much content to see yet, but that’ll change quickly. Also, I encourage anyone local to Bend to head over and feel free to contribute. We’re open to just about anything.

December 5, 2003