Category Archives: essay

Essays

Bradley Manning’s Statement

The transcript of Bradley Manning’s statement at his providence inquiry is well worth reading. It is a picture of a man who was deeply troubled by information he found in his duties as an intelligence analyst, and who decided to make that information public out of a belief that it ultimately would make the United States a better country, and the world a better place. Here’s a quote:

In attempting to conduct counter-terrorism or CT and counter-insurgency COIN operations we became obsessed with capturing and killing human targets on lists and not being suspicious of and avoiding cooperation with our Host Nation partners, and ignoring the second and third order effects of accomplishing short-term goals and missions. I believe that if the general public, especially the American public, had access to the information contained within the CIDNE-I and CIDNE-A tables this could spark a domestic debate on the role of the military and our foreign policy in general as [missed word] as it related to Iraq and Afghanistan.

I also believed the detailed analysis of the data over a long period of time by different sectors of society might cause society to reevaluate the need or even the desire to even to engage in counterterrorism and counterinsurgency operations that ignore the complex dynamics of the people living in the effected environment everyday.

His statement also shows that he understands that he broke the law and takes responsibility for his actions, and in that sense his actions should be considered civil disobedience. And that makes it all the more appalling that he’s been kept in humiliating and inhumane conditions for over a thousand days without trial.

Two new projects: German Grammar and Möbius

I’ve been hacking on two new projects in my spare time.

The German Grammar Explorer (mainly the German Declension Explorer) is helping me wrap my head around some of the more complex patterns in the German language. It’s also an experiment in deliberate synæsthesia; It uses a palette of eight colors plus white to color-code similar patterns and related morphosyntax. The idea is to give a general feeling for when the general patterns of the language are broken.

Möbius is a totally useless experiment in binding scroll events and doing funny stuff with them, and experimenting with some newer features of HTML 5 and CSS 3.

Why I don’t rely on Google

Recently several otherwise tech-savvy friends have been perplexed that I don’t just use Google for everything. They explain that I could use Google Voice for my U.S. phone number, use Google Checkout for The Mathematician’s Dice, sync my contacts and calendars through GMail, and log in to many things on the web using their OpenID service. And they wonder why I suffer a bit of spam instead of using GMail for my primary email account.

Continue reading

Introducing Fragments

Today I am announcing Fragments, a project I’ve been working on for a few months. It’s on GitHub and PyPI.

Fragments uses concepts from version control to replace many uses of templating languages. Instead of a templating language, it provides diff-based templating; instead of revision control, it provides “fragmentation control”.

Fragments enables a programmer to violate the DRY (Don’t Repeat Yourself) principle; it is a Multiple Source of Truth engine.

What is diff-based templating?

Generating HTML with templating languages is difficult because templating languages often have two semi-incompatible purposes. The first purpose is managing common HTML elements & structure: headers, sidebars, and footers; across multiple templates. This is sometimes called page “inheritance”. The second purpose is to perform idiosyncratic display logic on data coming from another source. When these two purposes can be separated, templates can be much simpler.

Fragments manages this first purpose, common HTML elements and structure, with diff and merge algorithms. The actual display logic is left to your application, or to a templating language whose templates are themselves managed by Fragments.

What is fragmentation control?

The machinery to manage common and different code fragments across multiple versions of a single file already exists in modern version control systems. Fragments adapts these tools to manage common and different versions of several different files.

Each file is in effect its own “branch”, and whenever you modify a file (“branch”) you can apply (“merge”) that change into whichever other files (“branches”) you choose. In this sense Fragments is a different kind of “source control”–rather than controlling versions/revisions over time, it controls fragments across many files that all exist simultaneously. Hence the term “fragmentation control”.

As I am a linguist, I have to point out that the distinction between Synchronic and Diachronic Linguistics gave me this idea in the first place.

How does it work?

The merge algorithm is a version of Precise Codeville Merge modified to support cherry-picking. Precise Codeville Merge was chosen because it supports accidental clean merges and convergence. That is, if two files are independently modified in the same way, they merge together cleanly. This makes adding new files easy; use Fragment’s fork command to create a new file based on other files (or just cp one of your files), change it as desired, and commit it. Subsequent changes to any un-modified, common sections, in that file or in its siblings, will be applicable across the rest of the repository.

Like version control, you run Fragments on the command line each time you make a change to your HTML, not before each page render.

What is it good for?

Fragments was designed with the task of simplifying large collections of HTML or HTML templates. It could replace simpler CMS-managed websites with pure static HTML. It could also handle several different translations of an HTML website, ensuring that the same HTML structure was wrapped around each translation of the content.

But Fragments is also not HTML specific. If it’s got newlines, Fragments can manage it. That means XML, CSS, JSON, YAML, or source code from any programming language where newlines are common (sorry, Perl). cFragments is even smart enough to know not to merge totally different files together. You could use it to manage a large set of configuration files for different servers and deployment configurations, for example. Or you could use it to manage bug fixes to that mess of duplicated source files on that legacy project you wish you didn’t have to maintain.

In short, Fragments can be used anyplace where you have thought to yourself “this group of files really is violating DRY”.

Use it

Fragments is released under the BSD License. You can read more about it and get the code on GitHub and PyPI. And you can find me on Twitter as @glyphobet.

Special thanks to Ross Cohen (@carnieross) for his thoughts on the idea, and for preparing Precise Codeville Merge for use in Fragments.

Hackers, it is time to rethink, redesign, or replace GNU Gettext

GNU Gettext may be the de facto solution for internationalizing software, but every time I work with it, I find myself asking the same questions:

  • Why, in this age of virtual machines and dynamic, interpreted languages, do I still have to compile .po files to .mo files before I can use my translations?
  • I can reconfigure my web application, modify its database, and clear its caches whenever I want, so why do I have to do a code push and restart the entire runtime just so that “Login” can be correctly translated to “Anmelden”? Try explaining that to a business guy.
  • To translate new messages in my application, I have to run a series of arcane commands before the changed text is available to be translated. Specifically, the process involves generating .pot files, then updating .po files from them. Why isn’t this automatic?
  • Why is it still possible for bad translations to cause a crash? Translators do the weirdest things when presented with formatting directives in their translations… I’ve seen %s translated as $s and as %S%(domain)s translated as %(domaine)s, and ${0} translated as #0, but the most common is to just remove the weird formatting directives entirely. And they all cause string formatting code to crash.
  • Why isn’t there a better option for translating HTML? Translators shouldn’t be expected to understand that in Click <a href="..." class="link">Here!</a>, “Click” and “Here!” should be translated, but “class” and “link” should not be. And they certainly can’t be expected to understand that if their language swaps the order of “Click” and “Here”, the <a> tag should move along with “Here”.
  • Why isn’t there something better than the convention to assign the gettext function to the identifier _, and then wrap all your strings in _()? Not only is this phenomenally ugly, but one misplaced parenthesis breaks it: _("That username %s is already taken" % username)
  • Why is support for languages that have more than two plural forms still an awful, confusing, fragile hack? Plural support was clearly designed by someone who thought that all languages were like English in having merely singular and plural. I’ve seen too many .po files for singular/dual/plural languages, where the translator obviously did not understand that msgstr[0] is the singular, msgstr[1] the dual, and msgstr[2] the plural.
  • Why, in this age of distributed version control, experimental merge algorithms, and eventually consistent noSQL databases, is the task of merging several half-translated .po files from several different sources still a nightmarish manual process?
  • Why, if I decide I need an Oxford comma or a capital letter in my English message, do I risk breaking all of the translations for that message?

There are libraries that allow you to use .po files directly, and I’m sure you can hack up some dynamic translation reloading code. Eliminating the ugliness of _() in code, and avoiding incorrectly placed parentheses after it, could be done with a library that inspects the parse tree and monkeypatches the native string class. Checking for consistency of formatting directives is not that hard. A better HTML translation technique would take some work, but it’s not impossible. The confusion around plural forms is just a user-interface issue. Merging translated messages may not be fully automatable, but at least it could be made a lot more user-friendly. And the last point can be avoided by using message keys, but that hack shouldn’t be necessary.

Gettext is behind the times. Or is it? Half of me expects someone to tell me that all these projects I’ve worked on are just ignoring features of Gettext that would solve these problems. And the other half of me expects someone to tell me I should be using some next-generation Gettext replacement that doesn’t have enough Google juice for me to find. (Let me know on Twitter: @glyphobet.)

GNU Gettext is is based on Sun’s Gettext, whose API was designed in the early ’90s. Hackers, it’s 2012. Technology has moved forward and left Gettext behind. It is time to rethink, redesign, or replace it.

 

A week with Snow Leopard and Lion

After starting a new job a few weeks ago, I found myself in the somewhat unusual position of having a new Mac OS X Lion (10.7) installation at work and a Snow Leopard (10.6) installation at home. Switching every day focused my attention on the differences between the two. Read on to hear my thoughts.

Mission Control

What used to be Dashboard, Spaces, and Exposé have been combined into Mission Control. Overall, this is a tremendous improvement.

The All Windows mode scales all the windows proportionally, like Exposé did in Leopard. If you use any application with small windows (like Stickies), you’ll know how silly Snow Leopard’s Exposé looked when it scaled a 1200×1200 browser window to the same size as a bunch of 100×100 sticky notes and threw them all in a grid. (The non-proportional scaling in Snow Leopard’s Exposé is so useless I hack the Dock after every upgrade to bring back Leopard’s scaling.) The Application Windows mode in Lion uses Snow Leopard’s non-proportional scaling, which is generally ok because multiple windows in the same app are usually similarly sized.

All Windows also groups windows together, which I find helpful, although not critical. You can drag windows into other desktops from the All Windows mode, too. But if you move your pointer even slightly while clicking on a window to activate it, Mission Control thinks you’re starting to drag it, and ends up doing nothing when you meant to activate a window. This is maddening. There should be a threshold below which any mouse movement while clicking just ends up as a click (or perhaps there is; if so, it’s too low).

Mission Control stole Ctrl-Left and Ctrl-right keystrokes to switch desktops, which I use in the terminal all the fricking time. Luckily these keystrokes can be turned off in the settings. I was a multiple desktop power-user back when I used Xwindows, but I never turned on Spaces on Snow Leopard and still haven’t used them on Lion. I’m not sure why not.

There are a few display bugs. Sometimes the blue window highlights are way bigger than the window previews. Textmate’s Find dialog doesn’t (always) show up in Mission Control, which I hope will be fixed sometime before Textmate 2 is released later this year.

The biggest problem for Mission Control is that it doesn’t let you restore or even see minimized windows in All Windows mode, only in Application Windows mode, where they are lumped in with previously opened documents. If you enable “Minimize to Application Icon”, the only way to get at your minimized windows is via the Application Windows mode.

Scroll bars

I got used to the new scroll bars in 30 seconds. I haven’t missed the up/down arrows at all. Weirdly, the default scrolling direction on the MacMini was set up out of the box for trackpads, but on the MacBook, it was set up for scroll wheels.

Migration Assistant

Earlier this year I helped my mom migrate from a vintage MacMini running Tiger (10.4) to a new one running Snow Leopard. The Migration Assistants on the two machines completely refused to cooperate with each other during the initial setup of the new Mini. It took many, many attempts after the install was complete to I get it to migrate her documents. I thought this was because I was migrating between two major releases.

I started out using Lion on an elderly MacMini while my new laptop was in the mail, so I got to experience Migration Assistant again. Before attempting to migrate, I set up the new laptop using a dummy account, and ran Software Update on both computers until they were both running 10.7.1. I hoped the up-to-date Migration Assistant would do better at migrating between two identical versions of Mac OS, but it was just as recalcitrant as before. The two computers steadfastly refused to recognize each other at the same time, even though they were both on the same wireless network just inches away from each other (and from the wireless router). I finally borrowed an ethernet cable, turned the wireless off on both, and hooked them up ethernet port to ethernet port. It felt dirty, but it seemed to help; after a few more tries my account was migrated. I think it took longer to coax Migration Assistant into cooperation than it did to actually migrate my files.

So Migration Assistant is just as cranky between two Lion installs as it was between Tiger and Snow Leopard. I can’t imagine how frustrating it must be for the average, non-technical Mac user to migrate to a new Mac.

The new Mail

I forced myself to use the new side-by-side Mail interface for the first week, hoping that I’d come to like it. I didn’t. I hate it. I switched back to classic view after a week.

I can’t really say what bugs me about Mail, but it just feels wrong. One glaring issue is that the text and icons in the message list have been made bigger, which is a pain for someone with over ten years of email in lots of different folders. Some people, including Jon Gruber, speak highly of it, so I’m holding out hope that it will grow on me.

But when I install Lion on my personal machine, I’ll make a backup of Snow Leopard’s Mail.app and see if it works on Lion. Just in case.

Autocorrect

Autocorrect is for people who aren’t fast, good typers. It’s not for me. I turned it off. In Mail and system-wide. I wonder why Mail has its own setting for autocorrect that overrides the system setting.

XCode

Seriously, Apple? You mean I have to create an iTunes account, even though this is a work computer and I won’t ever be buying software for it, and verify it, just in order to download XCode? And then, the XCode application that gets installed is actually the XCode installer? What a hassle.

Ok, ok, to be fair, this isn’t a Lion thing. But seriously, Apple?

MacPorts

Aside from mysql5-server, I got everything I needed installed fine. I was even able to beat mysql5-server into submission, with a judicious application of MacPorts-kung-fu.

Textmate

After hearing many rumors about Textmate not really working under Lion, I’m happy to report that it seems to work fine. (Aside from the aforementioned wacky interaction with Mission Control.)

Launchpad

I won’t use Launchpad. Apple is clearly trying to unify the experience of iOS and Mac OS. But I already have Spotlight, the Dock, and the Applications folder. How many views of my applications do I need?

Resume

The resume feature rocks, at least in Terminal. Quit Terminal or restart your machine, and when you relaunch Terminal, all your windows are there, in the same places, with the same tabs, and the scrollback buffers are still there.

I honestly haven’t used or noticed Resume from any other applications. Of course, all the web browsers already have something like it.

Resize from any edge

For the last twenty-odd years, Windows (since 3.1) and most Xwindows window managers have had the ability to resize windows from any edge And Mac OS, going back just as long, to Classic Mac OS, has always forced you to resize (only down and to the right) by grabbing a little, 16×16 pixel box at the lower right of the window. This has always been a minor shortcoming in Mac OS’s interface.

Leave it to Apple’s UI designers to realize you could resize windows from any edge without having an fat, ugly, click-target border all the way around every single window. Resize from any edge is a perfect illustration of Apple’s design genius: Less clutter, same power. I’ve wanted this on Mac OS for so long, I’d be willing to pay the €23 just for this feature alone.

Push to Github using Mercurial

It’s simple to push Mercurial source code repositories to GitHub (or another git server). This is useful if you can’t, or don’t want to, switch off of Mercurial to git. You can take advantage of GitHub’s superior community and features (sorry, BitBucket) without having to puzzle over ____ (anti-git screed left as an exercise to the reader).

To do this, you’ll need the hg-git plugin for Mercurial, by Scott Chacon and others. The technique is outlined on hg-git’s read-me, but it still took me some effort to get working right, so I thought this blog post might be helpful.

I assume you already have Mercurial installed, a project in a Mercurial repository you want to push, and a GitHub account. Here’s how you do it.

Install hggit

If you’re using MacPorts and Python 2.6, you should be able to do this with:

sudo port install py26-hggit

At the end of hggit’s install process, hggit prints out a line to add to your ~/.hgrc. For MacPorts & Python 2.6, this will look like:

hggit=/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/hggit

Note: Installing hggit in a virtualenv is probably not a good idea unless you also are using a version of Mercurial installed in the same virtualenv. You could probably make it work, but it’s likely more work than it’s worth.

Set up and connect to a GitHub repository

Create an empty repository on GitHub to store your Mercurial project. Copy the git URL from the new repository. Append git+ssh:// to the beginning, and change the : to a /. For example, this:

git@github.com:mercurialpoweruser/mercurialrocks.git

becomes this:

git+ssh://git@github.com/mercurialpoweruser/mercurialrocks.git

This URL then goes in the [paths] section of your Mercurial repository’s .hg/hgrc. If you want to push and pull from GitHub by default, it will look like this:

[paths]
default = git+ssh://git@github.com/mercurialpoweruser/mercurialrocks.git

Or, if you already have a remote Mercurial repository that you want to keep using, you can add the GitHub repository with a different name:

[paths]
default = ssh://hg@bitbucket.org/mercurialpoweruser/mercurialrocks
github = git+ssh://git@github.com/mercurialpoweruser/mercurialrocks.git

Next, push your repository to GitHub for the first time, with hg push (if GitHub is your default) or hg push github (if GitHub isn’t the default).

Now, go to GitHub and reload the page, and revel in the feeling that you are living in a utopian future where cars fly, cold fusion supplies our electricity, and different DVCSs interoperate with each other.

Gotchas

If you find that Mercurial mysteriously stops pushing new changes to GitHub, even though there are new changes to push and it’s pushing them to Mercurial servers fine, you might want to take a look your bookmarks. Do this with hg bookmarks. Make sure that the changeset ID for master is the changeset ID of tip: hg tip. If they’re not the same, update the master bookmark with hg bookmark -f master. This happened to me when I used an older version of Mercurial (1.3.1) without the bookmarks extension enabled. Don’t do that.

Your mileage may vary

I haven’t used this technique on repositories that have lots of branches, file renames, merge conflicts, or other “fancy” distributed version control features. I wouldn’t advise switching your entire development team and mission-critical repositories over to this technique without some more extensive testing.

The FLOSS tortoise, chasing the wrong proprietary hare. Again.

For most of the aughts, the various Linux desktop projects tried to catch up to Windows by copying features wholesale, while Mac OS X innovated like crazy and blew past both by building an amazing desktop experience.

The omission of Chrome & Safari from Paul Rouget‘s pro-Firefox Is IE9 a modern browser? reminds me of that misguided attempt. I hope Firefox doesn’t make the same mistake as desktop Linux; it would be a shame if they focused all their energy trying to catch Explorer, while missing the rapid user-interface innovation going on elsewhere.

Update: Here’s an example of a great idea from 2007, from Alex Faaborg, out of the user experience team at Mozilla, that didn’t make it into Firefox 3.

Quick update for subscribers

I am going traveling very soon, so things will get pretty quiet around here. My posts here are split up into three categories, and each has a separate feed:

There’s of course also a master feed for all posts. You can follow my posts on Digg, or follow my account on the awesome soup.io, which aggregates this blog, and my twitter, t-shirts, 3-d printing, and various other online presences. (Update 2011 April 6: Digg has removed import by RSS)

Is Git really better than X?

Most of Scott Chacon’s points in Why Git is Better than X are spot-on. The page could even be renamed “Why distributed version control systems are better than Subversion and Perforce,” since those two are the clear losers. (And yes, Bazaar is so slow I think it deserves to be listed twice in the speed section.)

I’ve used each of Git, Mercurial, and Bazaar for several months in medium sized teams; I’ve done quite a bit of branching and merging in Mercurial and Bazaar, and a fair amount in Git. Based on that experience, I am compelled to disagree here with a few of the points, specifically with regards to Mercurial.

Cheap local branching

Chacon argues that only Git offers cheap local branching, but Mercurial allows exactly the same work-flows that he outlines, and they are just as easy. Chacon claims that Git’s branching model is different:

Git will allow you to have multiple local branches that can be entirely independent of each other…

This isn’t quite correct. The real difference between Git’s branching model and the others is that Git does not assume a one-to-one relationship between a logical branch and a directory on the file-system. In Git, you can have a single directory on the file-system and switch between different branches with the git checkout command without changing directories.

Mercurial (and Bazaar) enforce default to1 a one-to-one relationship between a logical branch and directory on the file-system. You can, however, use the hg clone command to make a new, “entirely independent,” branch. You can clone a local directory, a remote directory over SSH, or a remote Mercurial repository. Cloning a repository that’s on the local file-system is the way to create cheap local branches in Mercurial. Mercurial will even create hard links to save disk space and time.

For example, in Git, you make a cheap local branch with:

git branch featurebranch
git checkout featurebranch

And in Mercurial, you make a cheap local branch with:

cd ..
hg clone master featurebranch
cd featurebranch

In Git, you switch between branches with git checkout branchname. In Mercurial, since branches are in a one-to-one correspondence with directories, you switch between branches with cd ../branchname.

In Git, you pull changes from a local branch with git merge branchname. In Mercurial, you pull changes into from a local branch with hg pull -u ../branchname.

Chacon’s four example work-flows using cheap local branches are not only completely possible in Mercurial, they are just as trivial as they are in Git. (The first three are even equally trivial in Bazaar; I’m not sure about the last one in Bazaar, though.)

Chacon says:

You can find ways to do some of this with other systems, but the work involved is much more difficult and error-prone.

Reasonable people might also disagree about the intuitiveness of the specific commands, but creating, updating, merging, and deleting them in Mercurial are one or two commands in the shell, just like in Git. None of these tasks are more difficult or more error-prone in Mercurial.

Chacon also says:

when you push to a remote repository, you do not have to push all of your branches. You can only share one of your branches and not all of them.

This is a little misleading. In Mercurial, since there’s usually a one-to-one correspondence between branches and local directories, and since you can’t be in two directories at once, you’re usually only pushing a single branch. In Mercurial, you generally don’t have to think about which branch you’re pushing, or whether you’re pushing other branches that shouldn’t be pushed, because you generally only ever push one branch at a time. I would actually turn this argument around, and claim that Git’s default of keeping multiple logical branches in a single directory forces you to worry about which branches you’re pushing, and actually makes cheap local branches harder and more error-prone in Git than in Mercurial.

Reasonable people might disagree about the intuitiveness of a branching model which does or does not assume default to a one-to-one relationship between logical branches and local directories, but that’s not why cheap local branches are so powerful in Git. Rather, it’s the ability to clone and merge between repositories locally that allows cheap local branches, and it’s just as easy in Mercurial as in Git, and nearly as easy in Bazaar.

GitHub

GitHub is unarguably the biggest community around any distributed version control system. But Chacon goes too far when he says:

This type of community is simply not available with any of the other SCMs.

BitBucket may not be as big as GitHub, but it’s a “socially-targeted” community where Mercurial users can “fork and contribute” to other Mercurial projects. It might not have as many projects or contributors as GitHub, but a sheer difference in size doesn’t translate to “simply not available.”

Bazaar’s not listed as inferior to Git on this point, I assume because of Launchpad. If Launchpad counts, why not BitBucket?

I’d also be interested to see usage numbers comparing GitHub and everyone’s favorite open-source community brontosaurus, SourceForge. Has GitHub surpassed them in projects or users or commits or downloads?

Minor haggling points

I also disagree with Chacon about the benefits of the staging area or index, but that’s purely a matter of personal preference. I never wanted anything like the staging area before I started using Git, and I don’t miss it now when I’m using Mercurial. If anything, the extra concept of a staging area makes Git slightly harder to learn; newbies coming from any other version control system have to be taught about the -a option to git commit right away, or else they wonder why their changes aren’t getting committed.

In the “Easy to Learn” section, Chacon highlights the add commands in both Mercurial and Git, indicating that they are the same; but they are pretty different. In Mercurial, add schedules previously un-tracked files to be tracked. In Git, add adds changes in a current file to the staging area, or index, including, but not limited to, previously un-tracked files. That section is also missing Mercurial’s mv command. Other potentially confusing differences include Git’s revert, which corresponds to Mercurial’s backout; and Mercurial’s revert, which can be duplicated using Git’s checkout.

Last, I’d like to see the ballyhooed Fossil included in this breakdown.

  1. Update: Thanks to David Wolever, who points out that hg branch and hg bookmark can be used to have more than one branch per directory in Mercurial. I didn’t even know about these commands when I first wrote this post. But I think my central point still holds; creating cheap local branches in Mercurial is just as easy as in Git. []