Chuggnutt!

Tag: PHP

Rasmus is the Man

… Rasmus Lerdorf, that is, the creator and godfather of PHP. He’s got an article on the Oracle Technology Network titled “Do You PHP?” that’s definitely worth a read. Here’s a sample:

What it all boils down to is that PHP was never meant to win any beauty contests. It wasn’t designed to introduce any new revolutionary programming paradigms. It was designed to solve a single problem: the Web problem. That problem can get quite ugly, and sometimes you need an ugly tool to solve your ugly problem. Although a pretty tool may, in fact, be able to solve the problem as well, chances are that an ugly PHP solution can be implemented much quicker and with many fewer resources. That generally sums up PHP’s stubborn function-over-form approach throughout the years….

Despite what the future may hold for PHP, one thing will remain constant. We will continue to fight the complexity to which so many people seem to be addicted. The most complex solution is rarely the right one. Our single-minded direct approach to solving the Web problem is what has set PHP apart from the start, and while other solutions around us seem to get bigger and more complex, we are striving to simplify and streamline PHP and its approach to solving the Web problem.

The guy just oozes common sense. Here’s another bit about PHP that he wrote on the PHP-DEV mailing list about two years ago, one of my favorites that just sums up beautifully the philosophy of PHP:

The golden rules of PHP are to keep the WTF(*) factor low and the POTFP(**) factor high.

(*) What The Fuck
(**) Piss Off The Fewest People

No two ways about it: he’s one of my heroes.

March 4, 2004
Advanced PHP Programming

The book Advanced PHP Programming is out, by George Schlossnagle. Looks like it might be pretty interesting; there’s certainly a scarcity of good PHP books that cover advanced topics—most of them are targeted at the beginner and the basics, and don’t have anything to offer me.

(Quick disclaimer: some of the Wrox books actually look like they might be decent, but I haven’t had my hands on a Wrox PHP book since the first couple they published.)

There was a time when I wanted to write a PHP book. It was going to be an advanced book, called “PHP Secrets” and cover all sorts of topics. I never really pursued it, though, largely because of a general disillusionment in the computer book industry: you spend a year or more writing a book on a subject, and by the time it gets published it’s obsolete.

Thinking about it now, though, maybe a better venue for such a thing would be online, like what Mark Pilgrim did with his Dive Into Python book. That might be kind of cool; a live work-in-progress that I could (theoretically) keep up-to-date. Hmmm.

March 1, 2004
CMS Ranting

Gadgetopia has a good rant on content management that I’m just getting around to posting about. (CMS’s Should Manage Content, Not Display It)

My solution was to write a function library to make raw database calls to get everything out in a nice, big, nested PHP array. I essentially built an API for the CMS to make pulling content easy, but I do all the HTML processing in PHP, abandoning completely the display side of this CMS. I still use it for administration, workflow, etc. (which it excels at), but when PHP is such a fantastic, mature language, why reinvent the wheel?

I really don’t have anything to add to this, other than that this is largely why I favor developing my own PHP software rather than using pre-built systems—I have absolute control over the way the software works and I don’t have to rely on clunky, awkward front-end architecture and programming that I disagree with. Give me the data, and let me decide what to do with it.

February 23, 2004
PHP XML Benchmark

Interesting PHP benchmark of parsing XML showed up on PHP Everywhere. In High Speed XML Parsing is Not Intuitive, John Lim tested five methods of extracting the title element from an XML RSS feed. Surprising results; the regular expression match was by far the fastest, and I would have thought the SAX parsing (based on libxpat, I believe) would have scored significantly faster than the DOM or XPath parsing—but it came in last.

Of course, the regular expression matching in this case was a bit simplistic—typically if you’re going to parse XML files, you’re looking for more than one element. But it’s a good technique to keep in mind.

February 12, 2004
Content Management: Bootstrapping

I’ve been bootstrapping the code for my Personal Publishing System (nicknamed “Spokane”) that I wrote about here and here, and since I had intended this to be an open process that I’d blog about, I’m writing up some of what I’m doing and my thoughts on how to do it. (more…)

February 8, 2004
Formatting changes

I love templates. I was able to make some changes to the site formatting in mere minutes thanks to templates. Change two files, and it all propagates throughout the site. Lovely.

I use a modified version of the Template class from the PHP Base Library for just about any PHP programming project I work on any more. I’ve looked into other, similar classes for PHP but haven’t really found anything that comes close to the PHP Base Library Template.

I’ve never gotten into using Smarty largely because from what I know of it, it doesn’t fit my needs—it’s overkill for a templating system. (Caveat emptor. I could very well be wrong here.) Here’s a hint: not everything you use a template for needs to be/should be/can be compiled into PHP, which is what Smarty does. I can use my hacked Template class to build any kind of files, like my RSS file—not just PHP and HTML. Plus it’s very easy to use and it’s not burdened down with all the additional template scripting code (yeah, code) that Smarty allows.

For my money, if you’re working with Smarty, you might as well just forego it entirely and code in native PHP. But that’s just me.

February 7, 2004
Blogarama

Here’s something I’ve been looking for for a while now: Blogarama, a directory of weblogs. I’m not sure how good it is yet, but it’s a start. And it appears to be developed in PHP, which is always a good thing.

Their categorizations could use some work, but I imagine it’s rather hard to nail down a particular blog into a particular category—especially since most blogs already have their own microcosmic taxonomy. We’ll see what comes of it.

January 31, 2004
PHP: Best of Breed

I’ve been meaning to write this article for a while now, mainly to point to some really good PHP applications and spread some kudos.

There are many good applications and classes out there, but I’m limiting to those that I’ve had hands-on experience with. Even so, this is hardly a comprehensive list; I may do some follow-up articles highlighting more good PHP. (more…)

January 7, 2004
NetOffice Fix

This is a follow-up to last month’s post about NetOffice. I’ve gotten several emails from people wanting to know specifically what I changed to fix the session error I was running into.

First of all, these fixes apply to version 2.5.3 of NetOffice only. Other versions, you’re on your own.

In the file includes/library.php:

Comment out line 23: ini_set('session.save_handler', 'user');
Comment out lines 61-63: session_set_save_handler() stuff
Comment out line 1088, in _sess_mysql_read() function: _sess_mysql_destroy($session_id);
…and add this line instead: session_destroy();

This all kills the custom session handling, instead letting PHP use the default (temp files).

In the file general/login.php:

Comment out line 37: _sess_mysql_destroy($session_id);
…and add this line instead: session_destroy();

In the file projects_site/index.php:

Comment out line 22: _sess_mysql_destroy($session_id);
…and add this line instead: session_destroy();

After that, you should be able to get things working.

January 5, 2004
Thoughts on Content Management
I’ve been thinking a long time about content management systems (which isn’t surprising considering developing various types of website CMSes is what I do for a living), how they pertain to weblogs and similar types of content, how to implement them in PHP and MySQL, and what type of system I would really like to have. Now, content management is a big topic, so let me clarify and narrow down what I’m talking about before I go on.

Some definitions
A piece of content can be anything—a blog entry, a fragment of text, a photo, an MP3 file, a recipe for carrot cake, a Palm Reader ebook, a scrap of a note written on a yellow sticky pad. A lot of what defines and contextualizes the content is the metadata that goes along with it—the date it was created, the size of the file, the author, the image format, where it was created, etc. Now, granted, different types of content can have vastly different types of metadata; for instance, a JPEG image taken with a digital camera will have attributes attached to it describing its resolution, compression quality, file size, camera specs, and date and time it was taken, while a piece of GIS data will have, say, latitude and longitude attributes, elevation, and place name information (which could be any or all of street name, city name, county name, etc.).

Some requirements
After using and extending my own homebrewed blog software for over a year and half, examining other systems like Movable Type, and getting lots of ideas from other blogs and smart folks online, I’ve decided that what I’m thinking about is what I call a Personal Publishing System (PPS?), which could be considered a subset of a CMS. The PPS should have some features of a CMS, but certainly doesn’t need all of them; allowing multiple users to manage content is okay, for instance, but a comprehensive workflow system is unnecessary—just being able to flag a content item as a draft or final version, and perhaps an approval tag, is all that’s needed. Here’s a list of some requirements I’d like to see in my PPS:
- Web based.
- Any type of content and its metadata can be handled.
- Each piece of content has a globally unique identifier (“guid”) of some kind.
- Each piece of content can be access/retrieved via a URL (probably incorporating the guid).
- Content can be published in any format: HTML (browsers), RSS (syndication/aggregators), PDF, etc. etc.
- Content can be categorized based on a hierarchical tree of categories. In fact, content can be assigned to multiple categories.
My general philosophy here is that I want to challenge my own notions about what constitutes a blog and see how far I can take it. Hubris, probably.

Database theory
A well-formed and normalized database would rightly split different types of content into their own properly modeled tables, which is the sane, efficient and right thing to do. I love data normalization, and I take a particular joy in modeling a data structure to a relational database and normalizing the hell out of its elements.

In fact, as any Web application developer using a relational database will tell you, this is critical; the database is one of the biggest bottlenecks in the entire system, and it can be Web suicide for even a moderately-loaded site to have unoptimized tables behind your code.

On the other hand, there is a drawback in trying to run a content management system this way: for every new type of content you want the system to handle, you have to create a new table (or several, depending on how normalized you want to get) and then add code into your system for handling the new table(s). (Okay, astute PHP programmers will realize you could create a master table that contains information and metadata about the new tables, and have PHP code that automagically handles the new tables based on this master table info—so you would only have to create the new tables and the system auto-populates the master table info and knows how to deal with that content in a general way. You wouldn’t have to recode for new additions. I’ve done this. It works reasonably well, considering.) Pretty soon, you’ve got so many tables handling every different case you can think of, that database performance degrades regardless of how optimized each table is. And managing potentially hundreds of tables becomes a nightmare in logistics.

Left field
So of course, in imagining a theoretical structure for my PPS, I went slightly insane and threw this stuff out the window. Here’s the gist of it:

Treat every piece of content as the same as every other, and store it all in a single table. Preposterous? Probably. But bear in mind that there will be a common set of metadata attributes that every piece of content will have (at least in this context): a unique name or identifier (the guid), a date it was created, a title, a description. And of course, there would have to be a “body” field for the content itself. Roll those into the table structure.

What about different types of content—text versus images? Easy—include a MIME type field in the table, that defines the content type—”text/html” or “image/jpeg,” for instance. (You could store the actual binary data of an image in a file somewhere, linked to by the guid stored in the name field.)

Let’s look at this real quick in the context of a MySQL table:
```
   content_id -> Primary key
   name -> varchar (unique key)
   title -> varchar
   description -> text (probably will be >255 characters)
   date_created -> datetime
   mime_type -> varchar (possibly enum?)
   body -> mediumtext (large data sets, up to 16MB)
```
That handles the basic metadata, and could be sufficient for something like a weblog. But what if I want to add some content that has additional metadata that the table doesn’t account for—like a geocaching record, and I want to track latitude and longitude coordinates somewhere? I can’t add more fields to the table—that’s a loser’s game for (I hope) obvious reasons. Once I had settled on the idea of a MIME type field, the answer seemed clear: XML. Bake XML into the database structure as content.

To be clearer: set the MIME type of that piece of content to “text/xml” and the populate the body field with XML data of the content in question, with the extra metadata fields rolled into it as part of its XML definition. So, you might populate the body field with something like:
```
   <content type="geocache">
      <latitude>45.6684776</latitude>
      <longitude>-121.3394771</longitude>
      <dateHidden>2003-12-05</dateHidden>
      <cache type="traditional" name="coffee can">
         <item>Spiral-bound logbook</item>
         <item>Yo-yo</item>
         <item>Deck of cards</item>
      </cache>
   </content>
```
What I like about this idea is its object-oriented analogy: start with a basic definition for content—a “class”—and each instance of content inherits from the base class and, via XML, can extend the base class for itself.

There’s limitations to account for, as well. Not all types of data can be easily shoehorned into this model, so it shouldn’t be attempted. For instance, a voting system: you need a table to store the poll topics, one to store each option/answer, and at least one more for storing user votes. There would be no sense in trying to hack this into the content table, and the system would suffer if it was. So there’s always room for specialized functionality.

And, I’ve modeled some compromises. Rather than trying to manage the category system as just another type of content (so that you’d end up with parent-child content relationships), I pulled the categories out into another table. It’s cleaner and there’s more benefit to the system this way—I can add a many-to-many lookup table to allow for multiple categorization. (Incidentally, in my PPS, I call these channels, because they might fulfill a purpose beyond that of a traditional category system.)

Another compromise is the concept of content nodes. A content node is basically a grouping that content can be classified into—another lookup table. All the content I write for my blog would be assigned to the “chuggnutt.com blog” node, for instance.

Oops, and don’t forget about a commenting system—user comments (and perhaps ratings?) are a valuable source of metadata for any given piece of content. So I’ve allowed for another table to store comments, rather than making them another type of content, because I want to stay away from the parent-child relationship situation I alluded to above.

Will it all work? I don’t know. The proof is in the pudding, though—I’m working to convert my own blog to this system, so I’ll find out firsthand just how good (or bad) my ideas are. I really don’t think this system is viable to run as a large-scale, enterprise-style content management solution—hence the reason I’m calling this a Personal Publishing System. Incidentally, the working name (or code name, if you will) in the back of my mind for this system is “Spokane.”

I’m making this an open process, too, to solicit comments on my ideas, and hopefully to give ideas to any other people out there looking to write their own systems in PHP. To that end, the next article I’ll post on this topic will move from theory to practice, and I’ll publish the MySQL database schema I’ve been developing (with comments). Exciting stuff!
December 6, 2003