BittyWiki revisited

A long time ago I wrote about BittyWiki, detailing my efforts to write a simple PHP wiki engine based on the Shortest Wiki Contest, which asks the question:

What’s the shortest piece of source you can write that will implement a fully-featured wiki?

Looking over that old code I’d posted, it’s pretty basic, and doesn’t adhere to the general Wiki Principles posited by the SWC. Basically, that early version simply took hand-coded HTML entered into the form, stripped all but a few of the basic tags, and called it good.

It’s not, though, and over the years I’ve actually updated the code to actually adhere to (many of) the Principles as well as working to reduce the size within that constraint. And I recently realized I’d never revisited this topic to share the better code!

Before I post the PHP code itself, here are the Wiki Principles in use:

  • Automatic link generation – Wikis should automatically create new links/pages from text (internal links, not external links, which can be entered as-is), and the mechanism for doing this is two-fold:
    • Any text that is camel-cased — such as ThisTextLinksHere — automatically becomes a link to a page of the same name in the wiki.
    • Any text that is surrounded by single square brackets — such as [Cyberpunk] — also becomes a link to a page in the wiki.
  • When you click that link, you’ll be taken to that page in the wiki, and if it doesn’t exist yet, it will be created when you click “Edit” and save it.
  • Content editable by all – Technically, anyone who can access the wiki in their browser can click “Edit” and edit the content. Nothing fancy here, just a straight <textarea> and “Save” button.
  • Easy text input – Here’s the rub and the main difference from my original BittyWiki code, which required HTML input. “Easy text input” means no need for HTML coding and entering content using wiki syntax markup, in my case based on these text formatting rules:
    • Blank lines become new paragraphs, otherwise single line breaks are ignored.
    • Four or more hyphens (----) inserts a horizontal line.
    • Headers are created by wrapping text in equals signs matching the heading level. I only implemented <h2> and <h3>, so wrapping in “==” and “===“. We could of course implement all 1 through 6.
    • Lists, both unordered and ordered, are created by starting a new line with either an asterisk (*) for unordered or pound/hash sign (#) for ordered; to create nested lists, use two or more starting characters (**).
    • Indent a new line with one or more spaces at the beginning to use a monospace font (the <pre> tag in HTML).
    • Wrap text in two single quotes for italics (''), and three single quotes for bold ('''). If you use five single quotes to wrap ('''''), you’ll get both for extra emphasis.
    • However, use six single quotes ('''''') to separate or prevent camel-cased text from automatically becoming a link, e.g. “Bitty''''''Wiki“.
    • Insert external links: Precede any URL with one of “http:“, “https:“, “ftp:“, “gopher:“, “mailto:“, or “news:” to have those automatically be converted into active links that will open in a new window.
    • URLs ending with .gif, .jpg, .jpeg, or .png are converted to inline images.
    • This one’s a bit esoteric, but text that start with “ISBN:” followed by an ISBN number are converted to Amazon links (and of course my Amazon Associate ID is added!).
    • Text wrapped in two colons (::) gets converted to a <blockquote> (this is my own addition).
  • Back links – Any wiki page that links to the current page are linked back to at the bottom of the page (“What links here”).

There are no features for the more advanced topics like recent changes, revision/history, or even deleting pages, in the interest of trying to make this as small as possible, software-wise.

So, let’s get into the code! But it’s not going to be the smallest version of this code that I can make it; instead, I’m posting a readable version heavily commented to explain what’s going on. If you want a small version, you could run it through something like PHP Minify (or trim it down yourself as an exercise in PHP).

Also note—this is not production-ready code! It’s intended as a development exercise only. You can certainly try it out in development; you’ll want to make this your index.php file and you’ll need a data directory for the page file data to be stored in. But don’t run it as a production-ready wiki (or if you do, uh, good luck with that).

Now, the source code:

<?php
/**
 *  BittyWiki
 *
 *  How small can I make a functional wiki using PHP?
 *  If I attach a stylesheet, should that count against size?
 */

// "p" is the page name, passed via the URL query string ("index.php?p=HomePage").
// Remove any characters that aren't letters, numbers, underscores, or percent signs.
$p = preg_replace('/[^a-z0-9_%]+/i', '', $_GET['p']);
// If $p is empty, default to "HomePage".
$p = $p ?: 'HomePage';
// $f is our filename, based on $p, in the "data" directory.
$f = "data/$p";

// If this was submitted from the edit form, save the data and redirect back to the page.
if (!empty($_POST)) {
    // Strip out any carriage returns from the data before saving, since we only want to
    // work with Unix-style line endings.
    file_put_contents($f, str_replace("\r", '', $_POST['d']));
    header("Location: ?p=$p");
    die;
}

// If the file exists, read it in and escape any HTML entities.
if ($d = @file_get_contents($f)) {
    // Since PHP 8.1, htmlspecialchars() defaults to ENT_QUOTES, so we need to specify
    // ENT_COMPAT, because ENT_QUOTES escapes single quotes, which we don't want.
    $d = htmlspecialchars($d, ENT_COMPAT);
}

// If the "a" query string parameter is set to "edit", display the edit form.
if ($_GET['a'] == 'edit') {
    die("<html><head><title>BittyWiki: Edit $p</title></head>
    <body><h1>Edit $p</h1><form method='post' action='?p=$p'>
    <textarea name='d' cols='80' rows='25'>$d</textarea><br>
    <input type='submit'></form></body></html>");
}

// Note: exec() is generally frowned upon in web applications, and if it's
// enabled, this really really should have some kind of input validation and
// sanity checking. Don't run this in production!!!
// What this does is, finds all the files in the "data" directory that contain
// the current page name, and then will create a list of links to those pages
// for "What links here". Results are stored in $a as an array of names.
exec("grep -l $p data/*", $a);
if ($a) {
    // Store the links in $b to be displayed at the bottom of the page.
    $b = '<hr>What links here:';
    foreach ($a as $v) {
        $b .= " <a href='?p=$v'>$v</a>";
    }
    // Remove the "data/" prefix from the filenames.
    $b = str_replace('data/', '', $b);
}

// Now we get to the meat of the script, where we convert the wiki markup
// to HTML via a series of regular expressions.

// Convert any lines starting with one or more asterisks to an unordered list,
// and any lines starting with one or more pound signs to an ordered list.
// This uses a callback function to build the lists.
$d = preg_replace_callback(
    [
        '/^(\*+)(.+)$/m',
        '/^(#+)(.+)$/m'
    ],
    // "l" is the callback function, defined below.
    'l',
    $d
);
$d = preg_replace(
    [
        // Any line starting with one or more spaces will become <pre>.
        '/^ (.+)$/m',
        // Four or more hyphens in a row will become <hr>.
        '/-{4,}/',
        // Any word in square brackets will become a link to that page.
        '/\[([a-z0-9_]+)\]/i',
        // Remove any redundant end/start tags for lists, since the callback function
        // doesn't know about the surrounding <ul> or <ol> tags. Basically this
        // removes any </ul><ul> or </ol><ol> sequences to keep the lists clean.
        '~(</ul>\s*<ul>|</ol>\s*<ol>)~',
        // Convert any URLs ending in .jpg, .jpeg, .gif, or .png to an image tag.
        '/(https?:[^\s]+\.(jpe?g|gif|png))/',
        // Convert any URLs to links. This is tricky because we don't want to convert
        // URLs that are images as we've already done that above. So this uses a
        // negative lookahead assertion to make sure the URL doesn't end in an image
        // extension.
        '/((https?|ftp|gopher|mailto|news):(?![^\s]*(?:jpe?g|gif|png))[^\s]+)/',
        // Convert any WikiWords to links.
        '/\b(([A-Z][a-z]+){2,})/',
        // Convert any ISBNs to Amazon links, with my affiliate code.
        '/ISBN[: ]?([0-9X-]+)/i',
        // Convert any text wrapped with three equals signs to <h3>.
        '/===(.+)===/Us',
        // Convert any text wrapped with two equals signs to <h2>.
        '/==(.+)==/Us',
        // Convert any text wrapped with two single quotes to <blockquote>.
        '/::(.+)::/U',
        // Convert any blank lines to <p> (two or more newlines).
        "/\n{2,}/",
        // Remove 6 single quotes, which served as "breaks" for WikiWords.
        "/'{6}/",
        // Convert any text wrapped with three single quotes to <b>.
        "/'''(.+)'''/Us",
        // Convert any text wrapped with two single quotes to <i>.            
        "/''(.+)''/Us"
    ],
    [
        '<pre>$1</pre>',
        '<hr>',
        '<a href="?p=$1">$1</a>',
        '',
        '<img src="$1" border="0"/>',
        '<a href="$1" target="_blank">$1</a>',
        '<a href="?p=$1">$1</a>',
        '<a href="https://www.amazon.com/exec/obidos/tg/detail/-/$1/chuggnutt-20" target="_blank">ISBN $1</a>',
        '<h3>$1</h3>',
        '<h2>$1</h2>',
        '<blockquote>$1</blockquote>',
        '<p>',
        '',
        '<b>$1</b>',
        '<i>$1</i>'
    ],
    $d
);
// Finally, output the HTML all together.
echo "<html><head><title>BittyWiki: $p</title></head>
<body><h1>$p</h1>$d$b<hr>
<a href='?p=$p&a=edit'>Edit this page</a></body></html>";

/**
 * The callback function for the list conversion.
 *
 * @param array $m Array of matches from the regular expression.
 * @return string
 */
function l($m)
{
    // If the first character of the match is an asterisk, create an unordered list.
    $f = $m[1][0];
    $e = $f == '*' ? 'ul' : 'ol';
    return stripslashes(
        // Replace the first character of the first match with the opening list tag.
        // If there are multiple characters, e.g., "**" or "***", they will be replaced
        // with the same number of opening list tags.
        str_replace(
            $f,
            "<$e>",
            $m[1]
        ) .    
        // Wrap the rest of the line (the second match) in <li> tags.
        "<li>{$m[2]}</li>" .
        // Replace the first character of the first match with the closing list tag.
        // If there are multiple characters, e.g., "**" or "***", they will be replaced
        // with the same number of closing list tags.
        str_replace(
            $f,
            "</$e>",
            $m[1]
        )
    );
}

Comments

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.