The two ugly faces of HTML generation

There are two quite different reasons for implementing HTML generation on a website. The first reason is to insert dynamic content, content that comes from a database or is algorithmically generated, into pages. The second reason is templating; to ensure that standard, site-wide parts of the HTML, such as headers and footers, are pulled from a single source. The goal of the first is to have a dynamic, database-driven site. The goal of the second is to avoid having to edit tens, or hundreds, of HTML files when the site design changes, and to avoid copy-and-paste coding.

Most dynamic web applications solve both of these problems with a single, powerful, HTML generation language. Pylons uses Mako. Ruby on Rails and PHP use templates with escapes and inline code. I’ve never liked these solutions because they seem too powerful and too error prone. It’s very easy to leave out a closing tag or forget a critical attribute. And nothing other than good discipline and code review is stopping a web designer (or an attacker) from bypassing the application’s pretty MVC structure and opening a socket to connect to the database server in viewpost.php. But people have continued to use these HTML generation languages, because they were the only solution to a tough problem.

So why is it important to make this distinction between the two kinds of HTML generation? And can this distinction point the way to eliminate the problems with these HTML generation tools?

If you are building an AJAX app, you’ll quickly find that it’s easiest to query for all of your algorithmically generated or database data and convert it to HTML with JavaScript. This eliminates the first of these two reasons for wanting dynamically generated HTML. The server code doesn’t need any dynamic templating at all anymore; all it needs is to build a data structure, serialize it to XML (or JSON, or even better, YAML), and send it to the browser.

If the only reason left to generate HTML is to modularize various static site components, very simple, old-school solutions like server-side includes, or solutions like XSLT that were only ever intended for static document templating1 become viable again. Both of these solutions are simple enough and protected enough to trust non-programmer designers with. This is how I am generating the two HTML pages on Spydentify now, and I’m extremely happy with it.

This pattern also greatly improves the cacheability of a site as well. The majority of the data that the user needs is static data; CSS, JavaScript, images. And all of your HTML is pre-generated, static data too. Serving and caching static data is pretty much a solved problem. If the only dynamic data an app is serving is minimized into simple, short XML, JSON, or YAML, and the servers aren’t generating any HTML, caching dynamic data is less necessary and less intensive once it does become necessary.

Using this pattern, all HTML generation can always be safe and compartmentalized, and overly powerful, ugly, and error-prone HTML generation languages can be left behind forever.

  1. On Mosuki we have discovered, the hard way, that XSLT was never intended to generate HTML on the fly from dynamic data. []