Tag Archives: web programming

Introducing Fragments

Today I am announcing Fragments, a project I’ve been working on for a few months. It’s on GitHub and PyPI.

Fragments uses concepts from version control to replace many uses of templating languages. Instead of a templating language, it provides diff-based templating; instead of revision control, it provides “fragmentation control”.

Fragments enables a programmer to violate the DRY (Don’t Repeat Yourself) principle; it is a Multiple Source of Truth engine.

What is diff-based templating?

Generating HTML with templating languages is difficult because templating languages often have two semi-incompatible purposes. The first purpose is managing common HTML elements & structure: headers, sidebars, and footers; across multiple templates. This is sometimes called page “inheritance”. The second purpose is to perform idiosyncratic display logic on data coming from another source. When these two purposes can be separated, templates can be much simpler.

Fragments manages this first purpose, common HTML elements and structure, with diff and merge algorithms. The actual display logic is left to your application, or to a templating language whose templates are themselves managed by Fragments.

What is fragmentation control?

The machinery to manage common and different code fragments across multiple versions of a single file already exists in modern version control systems. Fragments adapts these tools to manage common and different versions of several different files.

Each file is in effect its own “branch”, and whenever you modify a file (“branch”) you can apply (“merge”) that change into whichever other files (“branches”) you choose. In this sense Fragments is a different kind of “source control”–rather than controlling versions/revisions over time, it controls fragments across many files that all exist simultaneously. Hence the term “fragmentation control”.

As I am a linguist, I have to point out that the distinction between Synchronic and Diachronic Linguistics gave me this idea in the first place.

How does it work?

The merge algorithm is a version of Precise Codeville Merge modified to support cherry-picking. Precise Codeville Merge was chosen because it supports accidental clean merges and convergence. That is, if two files are independently modified in the same way, they merge together cleanly. This makes adding new files easy; use Fragment’s fork command to create a new file based on other files (or just cp one of your files), change it as desired, and commit it. Subsequent changes to any un-modified, common sections, in that file or in its siblings, will be applicable across the rest of the repository.

Like version control, you run Fragments on the command line each time you make a change to your HTML, not before each page render.

What is it good for?

Fragments was designed with the task of simplifying large collections of HTML or HTML templates. It could replace simpler CMS-managed websites with pure static HTML. It could also handle several different translations of an HTML website, ensuring that the same HTML structure was wrapped around each translation of the content.

But Fragments is also not HTML specific. If it’s got newlines, Fragments can manage it. That means XML, CSS, JSON, YAML, or source code from any programming language where newlines are common (sorry, Perl). cFragments is even smart enough to know not to merge totally different files together. You could use it to manage a large set of configuration files for different servers and deployment configurations, for example. Or you could use it to manage bug fixes to that mess of duplicated source files on that legacy project you wish you didn’t have to maintain.

In short, Fragments can be used anyplace where you have thought to yourself “this group of files really is violating DRY”.

Use it

Fragments is released under the BSD License. You can read more about it and get the code on GitHub and PyPI. And you can find me on Twitter as @glyphobet.

Special thanks to Ross Cohen (@carnieross) for his thoughts on the idea, and for preparing Precise Codeville Merge for use in Fragments.

Hackers, it is time to rethink, redesign, or replace GNU Gettext

GNU Gettext may be the de facto solution for internationalizing software, but every time I work with it, I find myself asking the same questions:

  • Why, in this age of virtual machines and dynamic, interpreted languages, do I still have to compile .po files to .mo files before I can use my translations?
  • I can reconfigure my web application, modify its database, and clear its caches whenever I want, so why do I have to do a code push and restart the entire runtime just so that “Login” can be correctly translated to “Anmelden”? Try explaining that to a business guy.
  • To translate new messages in my application, I have to run a series of arcane commands before the changed text is available to be translated. Specifically, the process involves generating .pot files, then updating .po files from them. Why isn’t this automatic?
  • Why is it still possible for bad translations to cause a crash? Translators do the weirdest things when presented with formatting directives in their translations… I’ve seen %s translated as $s and as %S%(domain)s translated as %(domaine)s, and ${0} translated as #0, but the most common is to just remove the weird formatting directives entirely. And they all cause string formatting code to crash.
  • Why isn’t there a better option for translating HTML? Translators shouldn’t be expected to understand that in Click <a href="..." class="link">Here!</a>, “Click” and “Here!” should be translated, but “class” and “link” should not be. And they certainly can’t be expected to understand that if their language swaps the order of “Click” and “Here”, the <a> tag should move along with “Here”.
  • Why isn’t there something better than the convention to assign the gettext function to the identifier _, and then wrap all your strings in _()? Not only is this phenomenally ugly, but one misplaced parenthesis breaks it: _("That username %s is already taken" % username)
  • Why is support for languages that have more than two plural forms still an awful, confusing, fragile hack? Plural support was clearly designed by someone who thought that all languages were like English in having merely singular and plural. I’ve seen too many .po files for singular/dual/plural languages, where the translator obviously did not understand that msgstr[0] is the singular, msgstr[1] the dual, and msgstr[2] the plural.
  • Why, in this age of distributed version control, experimental merge algorithms, and eventually consistent noSQL databases, is the task of merging several half-translated .po files from several different sources still a nightmarish manual process?
  • Why, if I decide I need an Oxford comma or a capital letter in my English message, do I risk breaking all of the translations for that message?

There are libraries that allow you to use .po files directly, and I’m sure you can hack up some dynamic translation reloading code. Eliminating the ugliness of _() in code, and avoiding incorrectly placed parentheses after it, could be done with a library that inspects the parse tree and monkeypatches the native string class. Checking for consistency of formatting directives is not that hard. A better HTML translation technique would take some work, but it’s not impossible. The confusion around plural forms is just a user-interface issue. Merging translated messages may not be fully automatable, but at least it could be made a lot more user-friendly. And the last point can be avoided by using message keys, but that hack shouldn’t be necessary.

Gettext is behind the times. Or is it? Half of me expects someone to tell me that all these projects I’ve worked on are just ignoring features of Gettext that would solve these problems. And the other half of me expects someone to tell me I should be using some next-generation Gettext replacement that doesn’t have enough Google juice for me to find. (Let me know on Twitter: @glyphobet.)

GNU Gettext is is based on Sun’s Gettext, whose API was designed in the early ’90s. Hackers, it’s 2012. Technology has moved forward and left Gettext behind. It is time to rethink, redesign, or replace it.

 

Ten ways to build an unmaintainable web application

Old-school hackers had a long tradition of ensuring job security by building applications so unmaintainable that only the original authors could work on them. But in these days of web applications, unmaintainability has fallen by the wayside. Instead, design fads like CRUD, REST, MVC, DRY, and KISS, have eliminated the average programmer’s job security.

Here are ten quick tips for achieving maximum unmaintainability in your web application. Following them will ensure that, in thirty years, a web programmer like you will be as valuable as a fifty-eight year old COBOL programmer contracting at $200/hr for a Fortune 500 company that still hasn’t migrated off of PL/1. You too will be able to live on a dairy farm in Pennsylvania, grow a beard down to your navel, and work in your underwear. And you’ll never have to learn anything new, work with anyone else, or start another new project.

  1. Mix it up. Put some JavaScript into external files, but be sure to intersperse JavaScript into your HTML, some of it in <script> tags. Cram multiple JavaScript statements into onclick and other event attributes — the longer, the better. Do the same with CSS; put some into external files, some in <style> tags, and also put some critical CSS into complex style attributes. And remember to put most of your <script> and <style> tags in the middle of the page content, instead of in the <head>, so that they will be difficult to find.
  2. Make everything dynamic. Generate JavaScript and CSS in your HTML templates. Think of it as another type of eval. Generate HTML server-side using templates and browser-side using JavaScript. What’s harder than working around a obscure IE layout bug with weird markup tweaks? Making sure both your server templates and your JavaScript HTML generation work around the same bug with the same HTML black magic.
  3. Abstraction, Shmabstraction. Pass lots of data from the server to the browser, store it in hidden form fields in the page, and then pass it back, unchanged, when submitting the form. That way, when the back-end data model changes, you get to rewrite part of the interface too. Allow data-model or server implementation details to creep into the interface implementation. Is the database sharded? Is the cache dirty? Does this row use a composite key? No need to have the server abstract these details, just pass that information to the JavaScript and let it sort everything out. That way, a sysadmin or a DBA can break the UI just as easily as a web designer can.
  4. Keep your data unstructured. Make sure all communication between the browser and the server is just a flat list of key/value parameters. Some of your parameters will be data to store, others will be modes or flags that affect the behavior of the service you’re hitting, and still others will be modifiers to display messages or affect the behavior of the UI. Keeping your data unstructured ensures these different types of parameters will collide. Often.
  5. Commit to a platform. Don’t waste your time checking to see if your pages work in all browsers (at least not until you’re totally done). Better yet, develop only in a single browser and don’t even bother to find out whether the features you’re relying on even exist in other browsers. Nothing is more fragile than an application that’s tightly tied to a single platform.
  6. Trust the browser. Rely soley on JavaScript input checking for some data — don’t check input on the server-side. Store sensitive data in hidden form fields. Put authorization checks in the JavaScript rather than on the server. Parameters like authorized=1 just scream out for URL hacking, and storing them in hidden form fields is only slightly harder to hack.
  7. Trust the server. Rely soley on the server to check, store, and generate only valid data in some places. That way, a DBA can change a single column constraint or data-type, and parts of the UI start to fail.
  8. Don’t use DOCTYPEs. That way you’ll never be sure what rendering mode different browsers are going to use to render your content.
  9. Ignore the cascade. Don’t bother to understand what the C in CSS stands for.  Just keep overriding styles until a page element looks the way you want. That way, your styles will be fragile and will break unexpectedly when an intern changes something a reasonable person would expect to be unrelated.
  10. Don’t use classes or ids. Instead, always write JavaScript and CSS that finds nodes based on tag name, name, alt or title attributes, or by their position in the DOM. That way when anything in the page changes, the hierarchy, the attributes, or when the site is translated into another language, things break. If you do end up using class or id, be sure to make a separate class for every node in your document and assign the same id to several different nodes.

If, however, you want to write flexible code that can react to and evolve with the ever-changing needs of its users, even after you have left the project in the hands of a clever but inexperienced hacker, you should probably avoid these techniques, and read up on some of those lame new design fads instead.

Special thanks to all the programmers whose code has illuminated these techniques over the years. My job may not be as secure as yours, but at least my code, and my conscience, are clear.

A tiny fix to the jQuery hint plugin

Here’s a tiny fix to Remy Sharp‘s excellent jQuery Text box hints plug-in. Without this fix, jQuery‘s val function will return the hint text if the text box hasn’t been filled out by the user yet.

Here’s the patch:

@@ -20,7 +23,7 @@
       $win = $(window);

     function remove() {
-      if ($input.val() === title && $input.hasClass(blurClass)) {
+      if ($input.realval() === title && $input.hasClass(blurClass)) {
         $input.val('').removeClass(blurClass);
       }
     }
@@ -41,4 +44,17 @@
   });
 };

+
+$.fn.realval = $.fn.val;
+
+$.fn.val = function (value) {
+  var i = $(this);
+  if (value === undefined) {
+    return (i.realval() === i.attr('title')) ? '' : i.realval();
+  } else {
+    return i.realval(value);
+  }
+}
+
+
 })(jQuery);

And here’s the full plugin with the patch applied:

/**
* @author Remy Sharp
* @url http://remysharp.com/2007/01/25/jquery-tutorial-text-box-hints/
*
* better val() method added by Matt Chisholm, 2009/07/27
* http://glyphobet.net/blog/essay/878
*/

(function ($) {

$.fn.hint = function (blurClass) {
  if (!blurClass) {
    blurClass = 'blur';
  }

  return this.each(function () {
    // get jQuery version of 'this'
    var $input = $(this),

    // capture the rest of the variable to allow for reuse
      title = $input.attr('title'),
      $form = $(this.form),
      $win = $(window);

    function remove() {
      if ($input.realval() === title && $input.hasClass(blurClass)) {
        $input.val('').removeClass(blurClass);
      }
    }

    // only apply logic if the element has the attribute
    if (title) {
      // on blur, set value to title attr if text is blank
      $input.blur(function () {
        if (this.value === '') {
          $input.val(title).addClass(blurClass);
        }
      }).focus(remove).blur(); // now change all inputs to title

      // clear the pre-defined text when form is submitted
      $form.submit(remove);
      $win.unload(remove); // handles Firefox's autocomplete
    }
  });
};

$.fn.realval = $.fn.val;

$.fn.val = function (value) {
  var i = $(this);
  if (value === undefined) {
    return (i.realval() === i.attr('title')) ? '' : i.realval();
  } else {
    return i.realval(value);
  }
}

})(jQuery);

Ignore your users’ needs. Call them stupid instead.

Bert Bos’s Why “variables” in CSS are harmful illustrates some all-too-common mistakes technologists make when considering feature requests from their users. It also indicates how deeply out of touch Bos (and possibly the entire W3C) is from people who actually have to read, write, debug, and use CSS on a regular basis.

It begins:

Constants have been regularly proposed and rejected over the long history of CSS…

Proposals for a feature indicate that a technology (whether it be a specification or an application) has pain points that are going un-addressed. When those requests are frequent, they indicate that the person (or organization, in this case, the W3C) in charge of the technology is out of touch with its users.

…so there is no reason why constants should be useful now when they weren’t before.

This claim that constants are not useful underlies the entire essay, but Bos fails to ever really justify it. Here he wanders around in pseudo-mathematical jargon instead:

[An implementation of costants in CSS written in PHP] proves that it is not necessary to add constants to CSS…. But the PHP implementation has the benefit of letting authors determine the usefulness for themselves, without modifying CSS on the Web.

It sounds like Bos refuses to consider an implementation of variables1 in CSS unless someone provides him with a mathematical proof of their utility. But utility is an opinion, not something that can be proven, like Turing-completeness or the irrationality of √2.

The existence of the PHP implementation Bos mentions, and of other implementations like the wonderful CleverCSS or Reddit’s vaporous C55, argues strongly that variables are useful — so useful that many people have implemented them on top of CSS. Of course, this does not prove usefulness any more than any other opinion can be proven.

Implementation effort

Next Bos considers implementation effort:

…extending CSS makes implementing more difficult and programs bigger, which leads to fewer implementations and more bugs.

Difficulty of implementation should never be a deciding factor in whether or not to address the needs of the users. This point is important enough that it bears repeating: implementation effort is not relevant when deciding what your users need.

Why not?

Technology exists to make users’ lives easier. As a technology evolves and matures, users express needs and the authors of the technology develop features to address those needs. It is the user’s needs, not easily implemented features, that drive development of a technology.

If two features serve the same need, then picking the easier-to-implement one is perfectly reasonable. And if the only way to address the users’ needs is with a feature that’s extremely difficult or impossible to implement, then a project might find itself considering whether some or all of it is still viable. But a difficult implementation is never a justification in itself for not addressing the users’ needs.

In the case of CSS, the users ask for variables because they need some way to stop repeating themselves when they encode colors, lengths, and other values in CSS.

Refusing to serve the users’ needs because the requested feature may be difficult to implement shows a lack of understanding of those users’ needs as well as poor judgment about how to handle feature requests in general.

There’s another subtle fallacy here too. When Bos worries about ease of implementation, it sounds like he’s trying to make browser authors‘ lives easier, as if they were the users that the W3C are working for. But browser authors aren’t the real users of CSS any more than, for example, the authors of a C compiler are the ultimate users of C. Web designers are the real users of CSS. They are the target audience whose needs should be considered.

Arguing from implementation effort, and talking about browser implementors instead of CSS authors illustrates how far out of touch Bos is with real web designers, doing real work.

It’s also questionable how truly difficult implementing global, un-scoped variables (or un-changing constants) in CSS would be, especially compared to other complex aspects of CSS like the cascade. But that’s a discussion for browser authors.

Maintenance of stylesheets

Next Bos argues that variables would make CSS less maintainable, not more. Bos presents two reasons that code is encapsulated behind a function in programming languages:

Dividing up a problem into smaller ones is only one reason for defining functions. Just as important is the fact that a function that fits on one screen is easier to write than one that needs scrolling.

Because CSS variables wouldn’t help divide up a problem into smaller ones or help CSS stanzas fit on the screen more easily, Bos argues, they aren’t helpful:

[Constants] would add a cost (remembering user-defined names) without a benefit (avoiding problems that are longer than one screenful).

Experienced programmers know that there’s a third benefit to encapsulating code or data behind a function, variable, or constant: not repeating yourself. This is why users keep asking for variables in CSS.  Bos goes on to say that variables would be detrimental to CSS because they would increase the length of stylesheets. However, not repeating yourself is much more important than just keeping your code short2, so this point too is moot.

This section concludes:

What remains is the cost of remembering and understanding user-defined names.

Of course, stylesheets are full of user-defined class names, and CSS authors seem to have no problem using and remembering those, so it’s hard to see how user defined variable names are going to be any more intellectually challenging for CSS authors than class names.

Reusing style sheets and Learning CSS

The next two sections, “Reusing style sheets,” and “Learning CSS” continue to conjecture that user-defined variable names would be a great hindrance to using and learning CSS. But the frequency of proposals to add variables to CSS suggests they are not difficult to understand, and including them would not significantly hinder learning CSS.

But there’s not much point in arguing over such conjectures. Arguing from the point of view of a theoretical group of users who have, and lack, certain skills, is a dangerous distraction. If you have data about your users, use it. If not, collect some before making your decisions, or base your decisions on what you know your users can already do.

For the sake of argument, assume Bos’ hypothetical group of users exists. Assume there is a subset of the CSS authoring population that can comprehend the CSS cascade, relative sizes defined in ems, hexadecimal RGB color codes, and user-defined class names, but are unable to grasp the concept of a user-defined variable in CSS. (It sounds bizarre, but that’s what he’s claiming.)

These hypothetical users could just refrain from using variables in their stylesheets whatsoever. Unlike hexadecimal colors, em units, and many other aspects of CSS, nothing about variables would force CSS authors to use them. Variables could be added to the CSS standard without increasing its complexity or the effort required to learn it.

Bos also claims that user-defined variables would break easy reusability of CSS:

CSS is fairly easy to learn to read, even if some of its effects can be quite subtle. When there is a page you like, you can look at its style sheet and see how it’s done.

Anyone who has tried to copy a CSS effect from one site to another knows how difficult it truly is.  To copy the visual appearance of a single element, you must understand not only the computed style of that element, but the computed style of all of its parent elements. You need at least a rudimentary understanding of both the CSS cascade and the structure of the HTML of the page.  To copy the look of an entire page, you have to copy all of the CSS files for that page and mimic the structure of the HTML exactly, or reverse engineer the entire thing from the ground up.

Beyond the issue of whether copying CSS effects is easy or not, however, the question is whether CSS variables would make the job more difficult.

Bos points out that in-browser debugging tools help you to copy CSS by showing you the computed style. Presumably if CSS contained variables, those debugging tools would show you the values computed using those variables, not the just the variable names.

And if you were just copying a site’s HTML structure and CSS wholesale, then there’s no reason why you would even need to read the CSS or figure out what the variables mean.

It is too difficult to look in two places at once, the place where a value is used and the place where it is defined, if you don’t know why the rule is split in this way

Of course, when reverse-engineering the CSS for a site, a designer already needs to look in multiple “places at once” — they look in multiple CSS files, match class names in the CSS to names in the HTML, and consider the effects of the cascade. Is Bos really suggesting that a person capable of doing that will be incapable of finding a variable definition in the same file where that variable is used?

Rather than showing us that CSS variables would make re-using CSS more difficult, Bos asserts that a difficult, complex process is simple, and that CSS authors already performing this task are too stupid to handle a much simpler one.

Bos also claims that figuring out what variable names mean will be difficult for CSS authors — even when debugging their own code. Most CSS authors use generally descriptive class names like green-button, huge, and floatleft3. High traffic sites run their CSS (and their HTML and JavaScript) through compressors/obfuscators, but most of those sites use descriptive names internally too. It’s hard to see how CSS variables would be named any differently than CSS classes, so it’s hard to see how CSS variables would be any more difficult for CSS authors to reverse engineer or remember.

Summary

None of Bos’s arguments against variables in CSS hold up. He claims CSS doesn’t need variables, but fails to recognize  CSS authors’ true need to avoid repeating themselves. He argues CSS variables would be too difficult to implement, but implementation difficulties are invalid grounds to justify leaving users’ needs unaddressed. He argues that CSS variables would add too much complexity to CSS and no benefit whatsoever, but overlooks a key benefit that CSS variables would provide. All his arguments about the complexity variables would allegedly add to CSS are difficult to accept given the current complexity of CSS.

A feature request is a need in disguise, and multiple, persistent feature requests indicate a serious need behind a very thin disguise. Rather than arguing against a feature, you should endeavor to understand the underlying need. Rather than arguing from implementation complexity, you should decide whether that need must be addressed. Rather than arguing from hypothetical, invented users, and speculating about complexity, you should collect real user data or look at the kinds of tasks your users already handle.

This entire article4 calls into question Bos’ ability (and, by association, the W3C’s) to identify and address the needs of real CSS users and choose features to solve real shortcomings of CSS. I hope my analysis of this article helps other technologists learn to understand and address their users’ real needs better, and avoid poor reasoning when arguing against, or for, a specific feature.

For more on problems with CSS, see CSS Considered Unstylish.

  1. For brevity I’ve chosen use just the term variable throughout this article, even though all the points I make apply equally well to constants. []
  2. Bos’ point about the “computer screen becoming an extension of the programmer’s memory” is bizarre in the extreme. Even the best programmers or web designers will quickly end up with a program or stylesheet that’s bigger than will fit on a screeen, when working on anything but the most simple projects, if for no other reason than the project being split up into multiple files. []
  3. Perhaps Bos does not always use descriptive class names; The stylesheet for his article uses the class names yves and coralie. []
  4. The very end of the article has a clever suggestion: that constants be implemented as an external module. I’m not sure how this would work, but if it meant that a single set of constants for a site would be accessible in the HTML, and all of the site’s stylesheets, and maybe in the JavaScript too, well, that would be pretty cool. []

Another browser-side Model-View-Controller analogy

Coding Horror presents another way to think about the browser side of web apps as underlyingly MVC in Understanding Model-View-Controller. It’s interesting, but I still prefer my analogy for primarily AJAX web apps; when the data comes in primarily through XMLHTTPRequest, it doesn’t make much sense to think of anything but the JavaScript that handles XMLHTTPRequest responses as the model.