Tag Archives: HTML

Two new projects: German Grammar and Möbius

I’ve been hacking on two new projects in my spare time.

The German Grammar Explorer (mainly the German Declension Explorer) is helping me wrap my head around some of the more complex patterns in the German language. It’s also an experiment in deliberate synæsthesia; It uses a palette of eight colors plus white to color-code similar patterns and related morphosyntax. The idea is to give a general feeling for when the general patterns of the language are broken.

Möbius is a totally useless experiment in binding scroll events and doing funny stuff with them, and experimenting with some newer features of HTML 5 and CSS 3.

Hackers, it is time to rethink, redesign, or replace GNU Gettext

GNU Gettext may be the de facto solution for internationalizing software, but every time I work with it, I find myself asking the same questions:

Why, in this age of virtual machines and dynamic, interpreted languages, do I still have to compile .po files to .mo files before I can use my translations?
I can reconfigure my web application, modify its database, and clear its caches whenever I want, so why do I have to do a code push and restart the entire runtime just so that “Login” can be correctly translated to “Anmelden”? Try explaining that to a business guy.
To translate new messages in my application, I have to run a series of arcane commands before the changed text is available to be translated. Specifically, the process involves generating .pot files, then updating .po files from them. Why isn’t this automatic?
Why is it still possible for bad translations to cause a crash? Translators do the weirdest things when presented with formatting directives in their translations… I’ve seen %s translated as $s and as %S, %(domain)s translated as %(domaine)s, and ${0} translated as #0, but the most common is to just remove the weird formatting directives entirely. And they all cause string formatting code to crash.
Why isn’t there a better option for translating HTML? Translators shouldn’t be expected to understand that in Click <a href="..." class="link">Here!</a>, “Click” and “Here!” should be translated, but “class” and “link” should not be. And they certainly can’t be expected to understand that if their language swaps the order of “Click” and “Here”, the <a> tag should move along with “Here”.
Why isn’t there something better than the convention to assign the gettext function to the identifier _, and then wrap all your strings in _()? Not only is this phenomenally ugly, but one misplaced parenthesis breaks it: _("That username %s is already taken" % username)
Why is support for languages that have more than two plural forms still an awful, confusing, fragile hack? Plural support was clearly designed by someone who thought that all languages were like English in having merely singular and plural. I’ve seen too many .po files for singular/dual/plural languages, where the translator obviously did not understand that msgstr[0] is the singular, msgstr[1] the dual, and msgstr[2] the plural.
Why, in this age of distributed version control, experimental merge algorithms, and eventually consistent noSQL databases, is the task of merging several half-translated .po files from several different sources still a nightmarish manual process?
Why, if I decide I need an Oxford comma or a capital letter in my English message, do I risk breaking all of the translations for that message?

There are libraries that allow you to use .po files directly, and I’m sure you can hack up some dynamic translation reloading code. Eliminating the ugliness of _() in code, and avoiding incorrectly placed parentheses after it, could be done with a library that inspects the parse tree and monkeypatches the native string class. Checking for consistency of formatting directives is not that hard. A better HTML translation technique would take some work, but it’s not impossible. The confusion around plural forms is just a user-interface issue. Merging translated messages may not be fully automatable, but at least it could be made a lot more user-friendly. And the last point can be avoided by using message keys, but that hack shouldn’t be necessary.

Gettext is behind the times. Or is it? Half of me expects someone to tell me that all these projects I’ve worked on are just ignoring features of Gettext that would solve these problems. And the other half of me expects someone to tell me I should be using some next-generation Gettext replacement that doesn’t have enough Google juice for me to find. (Let me know on Twitter: @glyphobet.)

GNU Gettext is is based on Sun’s Gettext, whose API was designed in the early ’90s. Hackers, it’s 2012. Technology has moved forward and left Gettext behind. It is time to rethink, redesign, or replace it.

Ten ways to build an unmaintainable web application

Old-school hackers had a long tradition of ensuring job security by building applications so unmaintainable that only the original authors could work on them. But in these days of web applications, unmaintainability has fallen by the wayside. Instead, design fads like CRUD, REST, MVC, DRY, and KISS, have eliminated the average programmer’s job security.

Here are ten quick tips for achieving maximum unmaintainability in your web application. Following them will ensure that, in thirty years, a web programmer like you will be as valuable as a fifty-eight year old COBOL programmer contracting at $200/hr for a Fortune 500 company that still hasn’t migrated off of PL/1. You too will be able to live on a dairy farm in Pennsylvania, grow a beard down to your navel, and work in your underwear. And you’ll never have to learn anything new, work with anyone else, or start another new project.

Mix it up. Put some JavaScript into external files, but be sure to intersperse JavaScript into your HTML, some of it in <script> tags. Cram multiple JavaScript statements into onclick and other event attributes — the longer, the better. Do the same with CSS; put some into external files, some in <style> tags, and also put some critical CSS into complex style attributes. And remember to put most of your <script> and <style> tags in the middle of the page content, instead of in the <head>, so that they will be difficult to find.
Make everything dynamic. Generate JavaScript and CSS in your HTML templates. Think of it as another type of eval. Generate HTML server-side using templates and browser-side using JavaScript. What’s harder than working around a obscure IE layout bug with weird markup tweaks? Making sure both your server templates and your JavaScript HTML generation work around the same bug with the same HTML black magic.
Abstraction, Shmabstraction. Pass lots of data from the server to the browser, store it in hidden form fields in the page, and then pass it back, unchanged, when submitting the form. That way, when the back-end data model changes, you get to rewrite part of the interface too. Allow data-model or server implementation details to creep into the interface implementation. Is the database sharded? Is the cache dirty? Does this row use a composite key? No need to have the server abstract these details, just pass that information to the JavaScript and let it sort everything out. That way, a sysadmin or a DBA can break the UI just as easily as a web designer can.
Keep your data unstructured. Make sure all communication between the browser and the server is just a flat list of key/value parameters. Some of your parameters will be data to store, others will be modes or flags that affect the behavior of the service you’re hitting, and still others will be modifiers to display messages or affect the behavior of the UI. Keeping your data unstructured ensures these different types of parameters will collide. Often.
Commit to a platform. Don’t waste your time checking to see if your pages work in all browsers (at least not until you’re totally done). Better yet, develop only in a single browser and don’t even bother to find out whether the features you’re relying on even exist in other browsers. Nothing is more fragile than an application that’s tightly tied to a single platform.
Trust the browser. Rely soley on JavaScript input checking for some data — don’t check input on the server-side. Store sensitive data in hidden form fields. Put authorization checks in the JavaScript rather than on the server. Parameters like authorized=1 just scream out for URL hacking, and storing them in hidden form fields is only slightly harder to hack.
Trust the server. Rely soley on the server to check, store, and generate only valid data in some places. That way, a DBA can change a single column constraint or data-type, and parts of the UI start to fail.
Don’t use DOCTYPEs. That way you’ll never be sure what rendering mode different browsers are going to use to render your content.
Ignore the cascade. Don’t bother to understand what the C in CSS stands for. Just keep overriding styles until a page element looks the way you want. That way, your styles will be fragile and will break unexpectedly when an intern changes something a reasonable person would expect to be unrelated.
Don’t use classes or ids. Instead, always write JavaScript and CSS that finds nodes based on tag name, name, alt or title attributes, or by their position in the DOM. That way when anything in the page changes, the hierarchy, the attributes, or when the site is translated into another language, things break. If you do end up using class or id, be sure to make a separate class for every node in your document and assign the same id to several different nodes.

If, however, you want to write flexible code that can react to and evolve with the ever-changing needs of its users, even after you have left the project in the hands of a clever but inexperienced hacker, you should probably avoid these techniques, and read up on some of those lame new design fads instead.

Special thanks to all the programmers whose code has illuminated these techniques over the years. My job may not be as secure as yours, but at least my code, and my conscience, are clear.

I wish XML were becoming obsolete faster

XML sucks at (almost) everything it’s used for. Google has open-sourced Protocol Buffer, a typed, backwards compatible, compact, binary data-interchange format. Combined with YAML for configuration and data-persistence files that need to be human readable, there’s even less reason to use XML for any data-serialization.

XML, in the form of (x)HTML, seems ok for markup, and the XML-based (x)HTML templating in Genshi is the best templating language I’ve ever used (and I’ve used XSLT, Mako, Mighty, PHP, and a few others). I wonder if the reason that HTML (and XML) templating is so difficult, and templating language code is often so ugly, is because XML is actually a poor solution for markup too. HTML is obviously here to stay, but it would be an interesting thought experiment to design a successor markup language that is not strictly hierarchical, more human-readable, and designed with templating in mind.

Another browser-side Model-View-Controller analogy

Coding Horror presents another way to think about the browser side of web apps as underlyingly MVC in Understanding Model-View-Controller. It’s interesting, but I still prefer my analogy for primarily AJAX web apps; when the data comes in primarily through XMLHTTPRequest, it doesn’t make much sense to think of anything but the JavaScript that handles XMLHTTPRequest responses as the model.

The next big thing, part 1: Resolving the conflict between Model-View-Controller and AJAX design patterns

or, how I learned to stop worrying and love the XMLHTTPRequest…

This is the first part of what will become an ongoing series.

If you’ve built a website in the last few years, most likely you’ve adopted an architecture similar to Model-View-Controller, or MVC. If not, well, either your website is terribly simple, you haven’t had to modify it yet, or your code is spaghetti and you should be fired. Just kidding. (Or maybe you’ve come up with an even better architecture, in which case you should share your insights with us mere mortals.)

In MVC architecture, the model reads and writes data to and from a back-end data-store, and organizes the relational data in a nice, hierarchical fashion to be used by the controller. The view accepts input from the controller and generates output HTML, XML, RSS, JavaScript, SVG, PDF, or whatever you want to send to the user’s browser. And the controller accepts browser input, figures out what to query the model for, and picks which view to use and what data to send it.

figure 1: The traditional MVC architecture.

Continue reading →

Get out of the way

It’s becoming increasingly obvious that the W3C is stuck in 2001. Shape up. Quick. We don’t need to wait another five years for a grand unified theory of document presentation and mark-up. We need incremental improvements, and we needed them six years ago. If you don’t get with the program quickly, the industry is going to move on (CSS 2.2) without you (HTML 5).

glyphobet • глыфобет • γλυφοβετ

musings over a tuna fish sandwich