Tag Archives: software

thoughts about particular software products

Monolithic repositories versus project repositories

Recently, Hacker News got ahold of Gregory Szorc’s article on monolithic repositories, and even Wired is weighing in on how big codebases should be organized. While the discussion is interesting, it seems to focus on two extremes: on the one hand, putting all of a company’s code into a single monolithic repository, and, on the other hand, breaking a big company’s code code up into many, many small repositories. Both extremes are too simplistic. The better approach is to align repository boundaries with the software’s build, deployment, and versioning boundaries. What does this mean exactly? Perhaps it is best illustrated with some examples:

You have a small app that knows how to deploy itself, and you rarely, if ever, need to deploy anything other than the current version of it. Keep the deployment scripts in the same repository as the app.
You have a few existing APIs written in the same language and built using similar libraries. You’re about to build a replacement for one of those APIs in a different, high-performance language, which will use a different storage backend. Not only will the build and deployment steps for this new code be different, but you will want to deploy new versions of both the new and old app independently. Here you have build, deployment and versioning boundaries, so it’s natural to use a different repository for the new code. Since it’s a different language, the new code won’t be relying on any existing code, so there’s no temptation to copy and paste code into the new project.
You need to support a complex set of deployment environments, clusters of virtual machines running different services, connecting to different backends, testing, staging, and production environments running different versions of services, and so on. Here we have clear versioning and deployment boundaries: your deployment process needs to support deploying different versions of the code. Your deployment process needs to understand and know about different versions of what it is deploying, since you might need to rollback and deploy an earlier version. It also needs to have the ability to deploy, for example, the current stable version of the site backed by MySQL database to the production environment, and also deploy the head of an experimental branch backed by PostgreSQL to the staging environment. You won’t be able to encapsulate the switch from MySQL to PostgreSQL in a single merge of a feature branch anyway, so it’s just a headache to maintain deployment scripts which know about how to deploy to MySQL in the same branch where you have just removed all your MySQL dependent code. So the deployment scripts are better kept in their own repository.
You have an API for internal use only, and a single consumer of that API, both running inside your private network. Here you have total control over versioning: you can change the API and its consumer in a single commit, ensure that both are updated at deploy time, and there’s no versioning boundary. If the API and its consumer are also written in the same language, the there should be little to no reason to keep them in separate repositories.
You have an API accessible over the public internet, and Android and iOS apps which talk to that API. Here you already have significant versioning boundaries. You can’t be sure that every app out there is up-to-date, so you have to keep the old APIs up and running for a while, or make sure the latest APIs are backwards compatible. And you have significant deployment boundaries: you wouldn’t want to wait two weeks to deploy an update to the API because you are waiting for the new API support in the iOS app to make it to the App Store. Since any version of your API must support several different versions of your mobile apps, and any version of your mobile app must be able to talk to several different versions of your API, there is again no hope of rolling any change into a single (merge) commit, and there’s no added cleanliness or simplicity to gain from having these different pieces of the software in the same repository. So this code can be split into multiple repositories.

There are a few more things to remember when setting up repositories.

Splitting vs. Merging

It’s far simpler to split out part of a monolithic repository than is to merge two independent repositories:

If you’re splitting, just create a new repository, push the old code there, remove everything but what you want to split out, and commit. You can then re-organize it if you like, but you don’t have to (and if you’re using git, you can even filter the history). All the files keep their history, and you don’t have to worry about file name collisions.
If you’re merging, you most likely have to do some top level reorganization first, then you can merge one repository into the other, but now looking at history before this merge point will show a jumbled mess of commits from both repositories.

Increasing Fragmentation

If you feel like you’re being forced into creating a third repository to store code that is needed by two other repositories, then that’s probably a sign that those two repositories should be a single one. This is a common trap that projects get into; once they have split their repositories too finely, then the only solution seems to be more splitting. When considering this option ask yourself: do these two repositories really have different build, deployment, and versioning boundaries? If not, bite the bullet and merge the repositories, rather than creating a third one for the shared code.

Ease of Access

Ease of access to code is often presented as an advantage to the monolithic repositories model. But this argument is unconvincing. A programmer can still have access to all the company’s code, even if that requires cloning multiple repositories. And the vast majority of cases, a programmer is going to work on two, maybe three, different projects at the same time. It’s not as if Google is filled with programmers who work on WebM encoding for YouTube on Monday, map-reduce for Google search on Tuesday, CSS and JavaScript for Gmail on Wednesday, Java for Android on Thursday, and Chrome on Windows on Friday. Programmers like that are extremely rare, and you shouldn’t optimize your repository structure to make them happy, especially not if it means forcing everybody else to download and track changes to large amounts of code they will never touch.

In Conclusion

To sum up, both the one monolithic repository dogma, and the many small project-based repositories dogma are oversimplified to the point of being harmful. Instead, focus on splitting your code into repositories along its natural versioning, build, and deployment boundaries.

There’s no such things as bugs or features

Unless you’ve only ever worked with technical people, you’ve run into the old “is it a bug or is it a feature” argument. Generally, a business person reports something as a bug because it’s not working properly, but the reaction from technical people is that that particular feature just isn’t built yet or that specific detail was not in the original specification. This can be a source of great friction because it usually involves technical people saying the product will take longer than expected to finish. Less often, business will report a problem as a new feature, but the problem is already-built code that is just not functioning properly. This is less contentious because it usually means it’s less work to fix than business originally thought.

Is there a way to eliminate this debate?

Continue reading →

Why I don’t rely on Google

Recently several otherwise tech-savvy friends have been perplexed that I don’t just use Google for everything. They explain that I could use Google Voice for my U.S. phone number, use Google Checkout for The Mathematician’s Dice, sync my contacts and calendars through GMail, and log in to many things on the web using their OpenID service. And they wonder why I suffer a bit of spam instead of using GMail for my primary email account.

Continue reading →

Introducing Fragments

Today I am announcing Fragments, a project I’ve been working on for a few months. It’s on GitHub and PyPI.

Fragments uses concepts from version control to replace many uses of templating languages. Instead of a templating language, it provides diff-based templating; instead of revision control, it provides “fragmentation control”.

Fragments enables a programmer to violate the DRY (Don’t Repeat Yourself) principle; it is a Multiple Source of Truth engine.

What is diff-based templating?

Generating HTML with templating languages is difficult because templating languages often have two semi-incompatible purposes. The first purpose is managing common HTML elements & structure: headers, sidebars, and footers; across multiple templates. This is sometimes called page “inheritance”. The second purpose is to perform idiosyncratic display logic on data coming from another source. When these two purposes can be separated, templates can be much simpler.

Fragments manages this first purpose, common HTML elements and structure, with diff and merge algorithms. The actual display logic is left to your application, or to a templating language whose templates are themselves managed by Fragments.

What is fragmentation control?

The machinery to manage common and different code fragments across multiple versions of a single file already exists in modern version control systems. Fragments adapts these tools to manage common and different versions of several different files.

Each file is in effect its own “branch”, and whenever you modify a file (“branch”) you can apply (“merge”) that change into whichever other files (“branches”) you choose. In this sense Fragments is a different kind of “source control”–rather than controlling versions/revisions over time, it controls fragments across many files that all exist simultaneously. Hence the term “fragmentation control”.

As I am a linguist, I have to point out that the distinction between Synchronic and Diachronic Linguistics gave me this idea in the first place.

How does it work?

The merge algorithm is a version of Precise Codeville Merge modified to support cherry-picking. Precise Codeville Merge was chosen because it supports accidental clean merges and convergence. That is, if two files are independently modified in the same way, they merge together cleanly. This makes adding new files easy; use Fragment’s fork command to create a new file based on other files (or just cp one of your files), change it as desired, and commit it. Subsequent changes to any un-modified, common sections, in that file or in its siblings, will be applicable across the rest of the repository.

Like version control, you run Fragments on the command line each time you make a change to your HTML, not before each page render.

What is it good for?

Fragments was designed with the task of simplifying large collections of HTML or HTML templates. It could replace simpler CMS-managed websites with pure static HTML. It could also handle several different translations of an HTML website, ensuring that the same HTML structure was wrapped around each translation of the content.

But Fragments is also not HTML specific. If it’s got newlines, Fragments can manage it. That means XML, CSS, JSON, YAML, or source code from any programming language where newlines are common (sorry, Perl). cFragments is even smart enough to know not to merge totally different files together. You could use it to manage a large set of configuration files for different servers and deployment configurations, for example. Or you could use it to manage bug fixes to that mess of duplicated source files on that legacy project you wish you didn’t have to maintain.

In short, Fragments can be used anyplace where you have thought to yourself “this group of files really is violating DRY”.

Use it

Fragments is released under the BSD License. You can read more about it and get the code on GitHub and PyPI. And you can find me on Twitter as @glyphobet.

Special thanks to Ross Cohen (@carnieross) for his thoughts on the idea, and for preparing Precise Codeville Merge for use in Fragments.

Hackers, it is time to rethink, redesign, or replace GNU Gettext

GNU Gettext may be the de facto solution for internationalizing software, but every time I work with it, I find myself asking the same questions:

Why, in this age of virtual machines and dynamic, interpreted languages, do I still have to compile .po files to .mo files before I can use my translations?
I can reconfigure my web application, modify its database, and clear its caches whenever I want, so why do I have to do a code push and restart the entire runtime just so that “Login” can be correctly translated to “Anmelden”? Try explaining that to a business guy.
To translate new messages in my application, I have to run a series of arcane commands before the changed text is available to be translated. Specifically, the process involves generating .pot files, then updating .po files from them. Why isn’t this automatic?
Why is it still possible for bad translations to cause a crash? Translators do the weirdest things when presented with formatting directives in their translations… I’ve seen %s translated as $s and as %S, %(domain)s translated as %(domaine)s, and ${0} translated as #0, but the most common is to just remove the weird formatting directives entirely. And they all cause string formatting code to crash.
Why isn’t there a better option for translating HTML? Translators shouldn’t be expected to understand that in Click <a href="..." class="link">Here!</a>, “Click” and “Here!” should be translated, but “class” and “link” should not be. And they certainly can’t be expected to understand that if their language swaps the order of “Click” and “Here”, the <a> tag should move along with “Here”.
Why isn’t there something better than the convention to assign the gettext function to the identifier _, and then wrap all your strings in _()? Not only is this phenomenally ugly, but one misplaced parenthesis breaks it: _("That username %s is already taken" % username)
Why is support for languages that have more than two plural forms still an awful, confusing, fragile hack? Plural support was clearly designed by someone who thought that all languages were like English in having merely singular and plural. I’ve seen too many .po files for singular/dual/plural languages, where the translator obviously did not understand that msgstr[0] is the singular, msgstr[1] the dual, and msgstr[2] the plural.
Why, in this age of distributed version control, experimental merge algorithms, and eventually consistent noSQL databases, is the task of merging several half-translated .po files from several different sources still a nightmarish manual process?
Why, if I decide I need an Oxford comma or a capital letter in my English message, do I risk breaking all of the translations for that message?

There are libraries that allow you to use .po files directly, and I’m sure you can hack up some dynamic translation reloading code. Eliminating the ugliness of _() in code, and avoiding incorrectly placed parentheses after it, could be done with a library that inspects the parse tree and monkeypatches the native string class. Checking for consistency of formatting directives is not that hard. A better HTML translation technique would take some work, but it’s not impossible. The confusion around plural forms is just a user-interface issue. Merging translated messages may not be fully automatable, but at least it could be made a lot more user-friendly. And the last point can be avoided by using message keys, but that hack shouldn’t be necessary.

Gettext is behind the times. Or is it? Half of me expects someone to tell me that all these projects I’ve worked on are just ignoring features of Gettext that would solve these problems. And the other half of me expects someone to tell me I should be using some next-generation Gettext replacement that doesn’t have enough Google juice for me to find. (Let me know on Twitter: @glyphobet.)

GNU Gettext is is based on Sun’s Gettext, whose API was designed in the early ’90s. Hackers, it’s 2012. Technology has moved forward and left Gettext behind. It is time to rethink, redesign, or replace it.

A week with Snow Leopard and Lion

After starting a new job a few weeks ago, I found myself in the somewhat unusual position of having a new Mac OS X Lion (10.7) installation at work and a Snow Leopard (10.6) installation at home. Switching every day focused my attention on the differences between the two. Read on to hear my thoughts.

Mission Control

What used to be Dashboard, Spaces, and Exposé have been combined into Mission Control. Overall, this is a tremendous improvement.

The All Windows mode scales all the windows proportionally, like Exposé did in Leopard. If you use any application with small windows (like Stickies), you’ll know how silly Snow Leopard’s Exposé looked when it scaled a 1200×1200 browser window to the same size as a bunch of 100×100 sticky notes and threw them all in a grid. (The non-proportional scaling in Snow Leopard’s Exposé is so useless I hack the Dock after every upgrade to bring back Leopard’s scaling.) The Application Windows mode in Lion uses Snow Leopard’s non-proportional scaling, which is generally ok because multiple windows in the same app are usually similarly sized.

All Windows also groups windows together, which I find helpful, although not critical. You can drag windows into other desktops from the All Windows mode, too. But if you move your pointer even slightly while clicking on a window to activate it, Mission Control thinks you’re starting to drag it, and ends up doing nothing when you meant to activate a window. This is maddening. There should be a threshold below which any mouse movement while clicking just ends up as a click (or perhaps there is; if so, it’s too low).

Mission Control stole Ctrl-Left and Ctrl-right keystrokes to switch desktops, which I use in the terminal all the fricking time. Luckily these keystrokes can be turned off in the settings. I was a multiple desktop power-user back when I used Xwindows, but I never turned on Spaces on Snow Leopard and still haven’t used them on Lion. I’m not sure why not.

There are a few display bugs. Sometimes the blue window highlights are way bigger than the window previews. Textmate’s Find dialog doesn’t (always) show up in Mission Control, which I hope will be fixed sometime before Textmate 2 is released later this year.

The biggest problem for Mission Control is that it doesn’t let you restore or even see minimized windows in All Windows mode, only in Application Windows mode, where they are lumped in with previously opened documents. If you enable “Minimize to Application Icon”, the only way to get at your minimized windows is via the Application Windows mode.

Scroll bars

I got used to the new scroll bars in 30 seconds. I haven’t missed the up/down arrows at all. Weirdly, the default scrolling direction on the MacMini was set up out of the box for trackpads, but on the MacBook, it was set up for scroll wheels.

Migration Assistant

Earlier this year I helped my mom migrate from a vintage MacMini running Tiger (10.4) to a new one running Snow Leopard. The Migration Assistants on the two machines completely refused to cooperate with each other during the initial setup of the new Mini. It took many, many attempts after the install was complete to I get it to migrate her documents. I thought this was because I was migrating between two major releases.

I started out using Lion on an elderly MacMini while my new laptop was in the mail, so I got to experience Migration Assistant again. Before attempting to migrate, I set up the new laptop using a dummy account, and ran Software Update on both computers until they were both running 10.7.1. I hoped the up-to-date Migration Assistant would do better at migrating between two identical versions of Mac OS, but it was just as recalcitrant as before. The two computers steadfastly refused to recognize each other at the same time, even though they were both on the same wireless network just inches away from each other (and from the wireless router). I finally borrowed an ethernet cable, turned the wireless off on both, and hooked them up ethernet port to ethernet port. It felt dirty, but it seemed to help; after a few more tries my account was migrated. I think it took longer to coax Migration Assistant into cooperation than it did to actually migrate my files.

So Migration Assistant is just as cranky between two Lion installs as it was between Tiger and Snow Leopard. I can’t imagine how frustrating it must be for the average, non-technical Mac user to migrate to a new Mac.

The new Mail

I forced myself to use the new side-by-side Mail interface for the first week, hoping that I’d come to like it. I didn’t. I hate it. I switched back to classic view after a week.

I can’t really say what bugs me about Mail, but it just feels wrong. One glaring issue is that the text and icons in the message list have been made bigger, which is a pain for someone with over ten years of email in lots of different folders. Some people, including Jon Gruber, speak highly of it, so I’m holding out hope that it will grow on me.

But when I install Lion on my personal machine, I’ll make a backup of Snow Leopard’s Mail.app and see if it works on Lion. Just in case.

Autocorrect

Autocorrect is for people who aren’t fast, good typers. It’s not for me. I turned it off. In Mail and system-wide. I wonder why Mail has its own setting for autocorrect that overrides the system setting.

XCode

Seriously, Apple? You mean I have to create an iTunes account, even though this is a work computer and I won’t ever be buying software for it, and verify it, just in order to download XCode? And then, the XCode application that gets installed is actually the XCode installer? What a hassle.

Ok, ok, to be fair, this isn’t a Lion thing. But seriously, Apple?

MacPorts

Aside from mysql5-server, I got everything I needed installed fine. I was even able to beat mysql5-server into submission, with a judicious application of MacPorts-kung-fu.

Textmate

After hearing many rumors about Textmate not really working under Lion, I’m happy to report that it seems to work fine. (Aside from the aforementioned wacky interaction with Mission Control.)

Launchpad

I won’t use Launchpad. Apple is clearly trying to unify the experience of iOS and Mac OS. But I already have Spotlight, the Dock, and the Applications folder. How many views of my applications do I need?

Resume

The resume feature rocks, at least in Terminal. Quit Terminal or restart your machine, and when you relaunch Terminal, all your windows are there, in the same places, with the same tabs, and the scrollback buffers are still there.

I honestly haven’t used or noticed Resume from any other applications. Of course, all the web browsers already have something like it.

Resize from any edge

For the last twenty-odd years, Windows (since 3.1) and most Xwindows window managers have had the ability to resize windows from any edge And Mac OS, going back just as long, to Classic Mac OS, has always forced you to resize (only down and to the right) by grabbing a little, 16×16 pixel box at the lower right of the window. This has always been a minor shortcoming in Mac OS’s interface.

Leave it to Apple’s UI designers to realize you could resize windows from any edge without having an fat, ugly, click-target border all the way around every single window. Resize from any edge is a perfect illustration of Apple’s design genius: Less clutter, same power. I’ve wanted this on Mac OS for so long, I’d be willing to pay the €23 just for this feature alone.

The FLOSS tortoise, chasing the wrong proprietary hare. Again.

For most of the aughts, the various Linux desktop projects tried to catch up to Windows by copying features wholesale, while Mac OS X innovated like crazy and blew past both by building an amazing desktop experience.

The omission of Chrome & Safari from Paul Rouget‘s pro-Firefox Is IE9 a modern browser? reminds me of that misguided attempt. I hope Firefox doesn’t make the same mistake as desktop Linux; it would be a shame if they focused all their energy trying to catch Explorer, while missing the rapid user-interface innovation going on elsewhere.

Update: Here’s an example of a great idea from 2007, from Alex Faaborg, out of the user experience team at Mozilla, that didn’t make it into Firefox 3.

A two-step refactoring tactic

In the middle of a major, hairy refactoring recently, I codified a tactic for refactoring that I’d been using, unconsciously, for years. The tactic is to break down refactoring into two modes, and deliberately switch between them:

Clean-up mode: Clean up and reorganize the code, without changing the code’s external functionality.
Change mode: Make the smallest, simplest self-contained change, making sure the code is functional again when it’s completed.

You start in clean-up mode, and switch back and forth as needed. This tactic is probably unsurprising to experienced programmers, but for those of you not yet as hard-core as Donald Knuth, let me explain why it’s a good idea.

Separating clean-up and changes into discrete modes of working gives you less to think about at any one time. A refactor can be very invasive—code that was closely tied together is now completely decoupled, or vice-versa; functions, objects and modules become obsolete or nonsensical, or split apart or merge in unexpected ways; identifiers become ambiguous or collide. If you’re trying to reorganize everything at the same time you’re juggling old and new code in your head, it’s easy (for anyone) to get lost in the maze.

When you’re consciously in clean-up mode, you can focus just on tasks like making sure variables have unambiguous, correct names, object / module / function / &c. organization and boundaries are sensible, and so on. You’re not changing any functionality, so your tests (you do have tests, right?) still pass, and the application still behaves as it always has (you did test it manually, right?). And you can be liberal in your clean-up; if you end up improving code that isn’t ultimately affected by the refactor, there’s no harm in that.

Once you feel like the code is clean and ready for the changes in functionality, commit your clean-up (you are using version control, right?). I usually use a commit message like “clean-up in preparation for….”

Now switch to change mode, and start making the changes. Quite often, you’ll discover something else that needs to be cleaned up. But you’re in change mode, not clean-up mode. Since you’re only making the smallest, simplest self-contained change, this new clean-up can probably wait until later. But if it can’t wait until later, then roll back your changes (or save a patch, or store your work in the attic, or the stash, or shelve it, or whatever the kids are calling it these days). Complete the new clean-up, commit it, and then go back to change mode and your half-completed changes.

This tactic has some additional benefits that might not be immediately obvious, too:

Clean-up mode forces you to read over all of the code that’s going to be changed and ask, “How is this code going to be affected by the changes I’m planning?” Sometimes you discover unanticipated edge-cases or bugs in your planned changes. Sometimes you realize the whole plan is flawed, and have to go back to the drawing board. And sometimes, after a good, hearty clean-up session, you realize that you can make the required changes in a less invasive, simpler way. I find that most refactors are more clean-up than anything else.
If the codebase has a “master” or “main” branch, and you use a “feature” or “working” branch for your refactor, you can put the clean-up commits into the master branch (they don’t change any functionality, right?) and only commit the functional changes to the feature branch. What’s the point of that? Well, everyone else gets to benefit from your code clean-up right away, and when you do end up merging your functionality changes into the master branch, because it’s a smaller delta, there’s a smaller chance of conflicts.

The next time you’re staring down the barrel of a nasty refactor, and cursing the person who didn’t fully think think through the business requirements, try this. It won’t make every refactor a piece of cake, nor will it become the hot new programming methodology acronym down in the valley (2SR, anyone? 2SRT?), but I guarantee you’ll be glad you tried it.

Dan Ariely on online dating

Interesting interview with the economist Dan Ariely on why online dating is so unsatisfying:

Online dating is becoming the poster child for a class of problems that software / the web is really, really bad at solving.

Ten more ways

Following his hilarious Ten Ways to Wreck Your Database, Josh Berkus outlines ten ways to destroy your open-source community. When I started at BitTorrent back in 2005, it had long been guilty of 3, 4, 5, 8, 9 and 10, and it never recovered.

glyphobet • глыфобет • γλυφοβετ

musings over a tuna fish sandwich