Tag Archives: programming

Monolithic repositories versus project repositories

Recently, Hacker News got ahold of Gregory Szorc’s article on monolithic repositories, and even Wired is weighing in on how big codebases should be organized. While the discussion is interesting, it seems to focus on two extremes: on the one hand, putting all of a company’s code into a single monolithic repository, and, on the other hand, breaking a big company’s code code up into many, many small repositories. Both extremes are too simplistic. The better approach is to align repository boundaries with the software’s build, deployment, and versioning boundaries. What does this mean exactly? Perhaps it is best illustrated with some examples:

  • You have a small app that knows how to deploy itself, and you rarely, if ever, need to deploy anything other than the current version of it. Keep the deployment scripts in the same repository as the app.
  • You have a few existing APIs written in the same language and built using similar libraries. You’re about to build a replacement for one of those APIs in a different, high-performance language, which will use a different storage backend. Not only will the build and deployment steps for this new code be different, but you will want to deploy new versions of both the new and old app independently. Here you have build, deployment and versioning boundaries, so it’s natural to use a different repository for the new code. Since it’s a different language, the new code won’t be relying on any existing code, so there’s no temptation to copy and paste code into the new project.
  • You need to support a complex set of deployment environments, clusters of virtual machines running different services, connecting to different backends, testing, staging, and production environments running different versions of services, and so on. Here we have clear versioning and deployment boundaries: your deployment process needs to support deploying different versions of the code. Your deployment process needs to understand and know about different versions of what it is deploying, since you might need to rollback and deploy an earlier version. It also needs to have the ability to deploy, for example, the current stable version of the site backed by MySQL database to the production environment, and also deploy the head of an experimental branch backed by PostgreSQL to the staging environment. You won’t be able to encapsulate the switch from MySQL to PostgreSQL in a single merge of a feature branch anyway, so it’s just a headache to maintain deployment scripts which know about how to deploy to MySQL in the same branch where you have just removed all your MySQL dependent code. So the deployment scripts are better kept in their own repository.
  • You have an API for internal use only, and a single consumer of that API, both running inside your private network. Here you have total control over versioning: you can change the API and its consumer in a single commit, ensure that both are updated at deploy time, and there’s no versioning boundary. If the API and its consumer are also written in the same language, the there should be little to no reason to keep them in separate repositories.
  • You have an API accessible over the public internet, and Android and iOS apps which talk to that API. Here you already have significant versioning boundaries. You can’t be sure that every app out there is up-to-date, so you have to keep the old APIs up and running for a while, or make sure the latest APIs are backwards compatible. And you have significant deployment boundaries: you wouldn’t want to wait two weeks to deploy an update to the API because you are waiting for the new API support in the iOS app to make it to the App Store. Since any version of your API must support several different versions of your mobile apps, and any version of your mobile app must be able to talk to several different versions of your API, there is again no hope of rolling any change into a single (merge) commit, and there’s no added cleanliness or simplicity to gain from having these different pieces of the software in the same repository. So this code can be split into multiple repositories.

There are a few more things to remember when setting up repositories.

Splitting vs. Merging

It’s far simpler to split out part of a monolithic repository than is to merge two independent repositories:

  • If you’re splitting, just create a new repository, push the old code there, remove everything but what you want to split out, and commit. You can then re-organize it if you like, but you don’t have to (and if you’re using git, you can even filter the history). All the files keep their history, and you don’t have to worry about file name collisions.
  • If you’re merging, you most likely have to do some top level reorganization first, then you can merge one repository into the other, but now looking at history before this merge point will show a jumbled mess of commits from both repositories.

Increasing Fragmentation

If you feel like you’re being forced into creating a third repository to store code that is needed by two other repositories, then that’s probably a sign that those two repositories should be a single one. This is a common trap that projects get into; once they have split their repositories too finely, then the only solution seems to be more splitting. When considering this option ask yourself: do these two repositories really have different build, deployment, and versioning boundaries? If not, bite the bullet and merge the repositories, rather than creating a third one for the shared code.

Ease of Access

Ease of access to code is often presented as an advantage to the monolithic repositories model. But this argument is unconvincing. A programmer can still have access to all the company’s code, even if that requires cloning multiple repositories. And the vast majority of cases, a programmer is going to work on two, maybe three, different projects at the same time. It’s not as if Google is filled with programmers who work on WebM encoding for YouTube on Monday, map-reduce for Google search on Tuesday, CSS and JavaScript for Gmail on Wednesday, Java for Android on Thursday, and Chrome on Windows on Friday. Programmers like that are extremely rare, and you shouldn’t optimize your repository structure to make them happy, especially not if it means forcing everybody else to download and track changes to large amounts of code they will never touch.

In Conclusion

To sum up, both the one monolithic repository dogma, and the many small project-based repositories dogma are oversimplified to the point of being harmful. Instead, focus on splitting your code into repositories along its natural versioning, build, and deployment boundaries.

Better, Fewer, Shorter Meetings

Many organizations are plagued with intolerably long, frequent, and ineffective meetings. Here’s what I’ve learned about how to have better, fewer, shorter meetings. Most of these insights come from working with Freyr Guðmundsson at my start-up (formerly GoodsCloud, now NewStore), and he tells me that many are originally from the book Death by Meeting.

Meeting types

There are three types of meetings:

  1. Strategicmeetings about what we should do.
  2. Tactical: meetings about how we are going to accomplish the strategy we already agreed on.
  3. Reflective: meetings about what went right and wrong and how we can do better next time.

These meeting types should occur in this sequence, in a loop.

Everyone should understand these meeting categories. Every meeting should fall into only one category. If a meeting category starts to change, that’s ok, but the group should be conscious of that when it happens.

Strategic meetings can change into tactical ones: “we can’t choose that strategy until we know how it will work” or reflective ones: “we tried that last time and it didn’t work”. Tactical meetings can change into strategic ones: “Well now maybe with these new thoughts we picked the wrong strategy” or reflective ones: “we tried that last time and it didn’t work”. And reflective meetings can turn into either, as you discover a better strategy or tactic and want to start exploring it.

When a strategic or tactical meeting changes to a different type, it’s a sign that previous meetings were incomplete, or skipped, an important participant was excluded, or institutional knowledge and experience has not spread to the whole team. When a reflective meeting changes to a different type, it’s a sign of the team getting ahead of itself.

When a meeting type is changing, one thing the participants can do is explicitly note that it’s now a different kind of meeting and agree to the change. But it’s often better to put the triggering issue aside and reschedule a new meeting of the new type for that issue, because frequently different people need to be in the meeting now that it’s a different type.

Controlling meeting lengths

  • Strategic meetings are the only kind that should be open-ended. It helps to loosely time-box them, but if they go over, that’s ok.
  • Tactical and reflective meetings should be strictly time-boxed and rescheduled, not continued, if they go over time.
  • Develop zero tolerance for anyone arriving late to meetings. Being five minutes late for a six person meeting wastes thirty minutes of company time. There are stories about companies that charge people $1 per minute per person that they are late to meetings. Two minutes late to an eight-person meeting? $16 off your paycheck. I’m not suggesting this (it probably is illegal in some places) but it’s good to make people aware that showing up late has a real cost.
  • Give a five minute grace time at the end of the meeting for people to take a break, go to the bathroom, get a coffee or a smoke, before their next meeting. So, schedule them for 11:00-11:25, or 11:00-11:55, not 11:00-11:30, or 11:00-12:00. This grace period will also help with tardiness.
  • Meetings tend to expand to their allotted time. Experiment with five, ten, fifteen minute meetings, don’t just make everything a half hour or an hour.

Anectdote: I once had a two-day workshop with a client who flew in three people to meet with two from our company. We spent about an hour understanding the problem, and another hour working out a solution. Then we spent the remainder of the day and a half mediating a contentious argument between two people from the client company, which was ultimately irrelevant to the solution. And we couldn’t stop it. These two people were in a power struggle and they suddenly had the free time allocated to pursue it. At the end of day two, as the client was packing up to go to the airport and get back to work, the two guys agreed to disagree on their contentious point, and we all reiterated our commitment to the solution we’d come up with in the first two hours the day before. It had been a huge waste of everyone’s time, but a great example of how people can fill up left-over meeting time for something else.

Meeting Participants

Meetings should contain only the minimum number of people that are necessary. The absolute best is to keep the meetings to no more than two or three.

If someone is sitting there, typing/tapping on his/her laptop/phone, and not really listening, that person was not necessary and shouldn’t have been invited.

Enforcing Meeting Etiquette

The most important point about meeting etiquette is to develop a culture where everyone, not just the boss or meeting leader, is aware of and advocating for the company’s meeting etiquette and rules.

  • The person who calls the meeting is responsible for publicizing the meeting type and the allotted time. If this is not done, then the rest of the team should ask for that information at the beginning of the meeting, or it should be canceled.
  • It’s everyone’s responsibility, not just the boss or meeting leader, to keep an eye on the time, to stick to the meeting type, and to call out when the time is up or when the meeting is changing type.
  • It’s also everyone’s responsibility to speak up when there is too much conflict. Often it’s the two most powerful/influential people in the meeting who end up arguing, and the others feel unable to speak up about it, and end up trapped listening to an argument. Develop a culture where anybody, even the person who just started yesterday, can speak up and say “calm down!” or ask “can you continue this discussion at a different time?”

Examples

  • I would hold strategic architecture meetings with just two or three of my devs from a nine-person team, and schedule them for an hour, and almost always end them early.
  • Our nine-person dev team’s reflective meetings were strictly time-boxed to one hour (well, 55 minutes). We frequently had more to talk about at the end, but usually they were points that only one or two people really cared about, so we would break out into little groups and talk about those remaining issues informally over the next day or so.

Conclusion

It might seem like a lot of work to stick to all of these guidelines and ideas, but actually it wasn’t that hard for us. I hope it helps you enjoy better, fewer and shorter meetings too.

Let’s code about bike locks some more

I just got sucked into Peter Norvig‘s Let’s Code About Bike Locks. If you haven’t read that yet, read it first, at least the beginning. Otherwise this will make no sense.

The strategy of starting from the first tumbler and then calculating the subsequent ones seemed like it could be improved on. What if we looked at all the four-letter English words, and chose the most common letter, on any tumbler, fixed that letter on its tumbler, and then continued on to the second most common letter, on any tumbler, and so on? How well could we do?

So I coded it up, and here’s the result, a lock that can make 1,410 words from Norvig’s list at http://norvig.com/ngrams/words4.txt, 170 more words than his best:

Lock: ABCDGLMPRST AEHILNORUY ACEILMNORST ADEKLNORSTY

I believe Norvig’s strategy of improving a lock with random permutations also would be less likely to improve on this lock. Changing any letter would, by definition, be choosing a letter that occurs in that position in fewer words. However, it’s still possible to improve; there might be some better letter choices that, while poorer overall, are still better for the specific other letters already chosen for the other tumblers.

Update 15 Jun 2015: Someone was wrong on the internet and this time it was me! Astute readers will notice that a tiny off-by-one bug in my implementation (see the fifth revision) led it to generate a lock with three tumblers with eleven letters each, and one tumbler with ten letters.

The new best lock from this implementation only generates 1,161 words, leaving Norvig’s solution the best still:

Lock: ABCDLMPRST AEHILNORUY AEILMNORST ADEKLNOSTY

There’s no such things as bugs or features

Unless you’ve only ever worked with technical people, you’ve run into the old “is it a bug or is it a feature” argument. Generally, a business person reports something as a bug because it’s not working properly, but the reaction from technical people is that that particular feature just isn’t built yet or that specific detail was not in the original specification. This can be a source of great friction because it usually involves technical people saying the product will take longer than expected to finish. Less often, business will report a problem as a new feature, but the problem is already-built code that is just not functioning properly. This is less contentious because it usually means it’s less work to fix than business originally thought.

Is there a way to eliminate this debate?

Continue reading

A tour of the differences between JavaScript and Python

Introduction

JavaScript and Python are two very important languages today. Too many programmers, however, work in both languages, but know just one of them well. This means they end up writing code in one language in the same style as the other, unaware of some of the more subtle differences between the two. If you know JavaScript or Python well, and you want to improve your skills and knowledge of the other, I wrote this article for you.

Disclaimer: I know Python (slightly) better than I know JavaScript, and I’ve not done any JavaScript outside of the browser, so I tried to keep the bits about JavaScript agnostic to the host environment, but, fair warning, there may be subtle differences in server-side JavaScript that are not mentioned here because I don’t know them.

Continue reading

Why you shouldn’t use git merge –rebase

There is a common belief that git merge --rebase is somehow preferable to normal merging. The general assertion seems to be that a linear history is somehow “cleaner”, “easier to understand“, and that normal merging introduces “extra commits” and “merge bubbles“, the latter presumably being only slightly less objectionable than economic bubbles. Some organizations even go so far as to mandate always merging with --rebase. But ask someone to give a real, technical justification—just one—for this belief, and they mumble some aesthetic vapidities and then start talking about the weather.

Let’s put aside for a moment the ridiculous assertion that a directed acyclic graph is somehow more difficult for programmers—programmers!—to understand than a linear history. I want to show you how normal merging is in fact preferable to using --rebase all the time.

Continue reading

Introducing Fragments

Today I am announcing Fragments, a project I’ve been working on for a few months. It’s on GitHub and PyPI.

Fragments uses concepts from version control to replace many uses of templating languages. Instead of a templating language, it provides diff-based templating; instead of revision control, it provides “fragmentation control”.

Fragments enables a programmer to violate the DRY (Don’t Repeat Yourself) principle; it is a Multiple Source of Truth engine.

What is diff-based templating?

Generating HTML with templating languages is difficult because templating languages often have two semi-incompatible purposes. The first purpose is managing common HTML elements & structure: headers, sidebars, and footers; across multiple templates. This is sometimes called page “inheritance”. The second purpose is to perform idiosyncratic display logic on data coming from another source. When these two purposes can be separated, templates can be much simpler.

Fragments manages this first purpose, common HTML elements and structure, with diff and merge algorithms. The actual display logic is left to your application, or to a templating language whose templates are themselves managed by Fragments.

What is fragmentation control?

The machinery to manage common and different code fragments across multiple versions of a single file already exists in modern version control systems. Fragments adapts these tools to manage common and different versions of several different files.

Each file is in effect its own “branch”, and whenever you modify a file (“branch”) you can apply (“merge”) that change into whichever other files (“branches”) you choose. In this sense Fragments is a different kind of “source control”–rather than controlling versions/revisions over time, it controls fragments across many files that all exist simultaneously. Hence the term “fragmentation control”.

As I am a linguist, I have to point out that the distinction between Synchronic and Diachronic Linguistics gave me this idea in the first place.

How does it work?

The merge algorithm is a version of Precise Codeville Merge modified to support cherry-picking. Precise Codeville Merge was chosen because it supports accidental clean merges and convergence. That is, if two files are independently modified in the same way, they merge together cleanly. This makes adding new files easy; use Fragment’s fork command to create a new file based on other files (or just cp one of your files), change it as desired, and commit it. Subsequent changes to any un-modified, common sections, in that file or in its siblings, will be applicable across the rest of the repository.

Like version control, you run Fragments on the command line each time you make a change to your HTML, not before each page render.

What is it good for?

Fragments was designed with the task of simplifying large collections of HTML or HTML templates. It could replace simpler CMS-managed websites with pure static HTML. It could also handle several different translations of an HTML website, ensuring that the same HTML structure was wrapped around each translation of the content.

But Fragments is also not HTML specific. If it’s got newlines, Fragments can manage it. That means XML, CSS, JSON, YAML, or source code from any programming language where newlines are common (sorry, Perl). cFragments is even smart enough to know not to merge totally different files together. You could use it to manage a large set of configuration files for different servers and deployment configurations, for example. Or you could use it to manage bug fixes to that mess of duplicated source files on that legacy project you wish you didn’t have to maintain.

In short, Fragments can be used anyplace where you have thought to yourself “this group of files really is violating DRY”.

Use it

Fragments is released under the BSD License. You can read more about it and get the code on GitHub and PyPI. And you can find me on Twitter as @glyphobet.

Special thanks to Ross Cohen (@carnieross) for his thoughts on the idea, and for preparing Precise Codeville Merge for use in Fragments.

Hackers, it is time to rethink, redesign, or replace GNU Gettext

GNU Gettext may be the de facto solution for internationalizing software, but every time I work with it, I find myself asking the same questions:

  • Why, in this age of virtual machines and dynamic, interpreted languages, do I still have to compile .po files to .mo files before I can use my translations?
  • I can reconfigure my web application, modify its database, and clear its caches whenever I want, so why do I have to do a code push and restart the entire runtime just so that “Login” can be correctly translated to “Anmelden”? Try explaining that to a business guy.
  • To translate new messages in my application, I have to run a series of arcane commands before the changed text is available to be translated. Specifically, the process involves generating .pot files, then updating .po files from them. Why isn’t this automatic?
  • Why is it still possible for bad translations to cause a crash? Translators do the weirdest things when presented with formatting directives in their translations… I’ve seen %s translated as $s and as %S%(domain)s translated as %(domaine)s, and ${0} translated as #0, but the most common is to just remove the weird formatting directives entirely. And they all cause string formatting code to crash.
  • Why isn’t there a better option for translating HTML? Translators shouldn’t be expected to understand that in Click <a href="..." class="link">Here!</a>, “Click” and “Here!” should be translated, but “class” and “link” should not be. And they certainly can’t be expected to understand that if their language swaps the order of “Click” and “Here”, the <a> tag should move along with “Here”.
  • Why isn’t there something better than the convention to assign the gettext function to the identifier _, and then wrap all your strings in _()? Not only is this phenomenally ugly, but one misplaced parenthesis breaks it: _("That username %s is already taken" % username)
  • Why is support for languages that have more than two plural forms still an awful, confusing, fragile hack? Plural support was clearly designed by someone who thought that all languages were like English in having merely singular and plural. I’ve seen too many .po files for singular/dual/plural languages, where the translator obviously did not understand that msgstr[0] is the singular, msgstr[1] the dual, and msgstr[2] the plural.
  • Why, in this age of distributed version control, experimental merge algorithms, and eventually consistent noSQL databases, is the task of merging several half-translated .po files from several different sources still a nightmarish manual process?
  • Why, if I decide I need an Oxford comma or a capital letter in my English message, do I risk breaking all of the translations for that message?

There are libraries that allow you to use .po files directly, and I’m sure you can hack up some dynamic translation reloading code. Eliminating the ugliness of _() in code, and avoiding incorrectly placed parentheses after it, could be done with a library that inspects the parse tree and monkeypatches the native string class. Checking for consistency of formatting directives is not that hard. A better HTML translation technique would take some work, but it’s not impossible. The confusion around plural forms is just a user-interface issue. Merging translated messages may not be fully automatable, but at least it could be made a lot more user-friendly. And the last point can be avoided by using message keys, but that hack shouldn’t be necessary.

Gettext is behind the times. Or is it? Half of me expects someone to tell me that all these projects I’ve worked on are just ignoring features of Gettext that would solve these problems. And the other half of me expects someone to tell me I should be using some next-generation Gettext replacement that doesn’t have enough Google juice for me to find. (Let me know on Twitter: @glyphobet.)

GNU Gettext is is based on Sun’s Gettext, whose API was designed in the early ’90s. Hackers, it’s 2012. Technology has moved forward and left Gettext behind. It is time to rethink, redesign, or replace it.

 

Python else in loops: survey results

As promised, here are the results of two totally unscientific surveys, one conducted at PyCon 2011 and the other over Twitter just a few days ago, about the behavior of else in Python loops. The results show that only 25% of respondents know what else in a Python loop actually does, and 55% think they know but are wrong.

Push to Github using Mercurial

It’s simple to push Mercurial source code repositories to GitHub (or another git server). This is useful if you can’t, or don’t want to, switch off of Mercurial to git. You can take advantage of GitHub’s superior community and features (sorry, BitBucket) without having to puzzle over ____ (anti-git screed left as an exercise to the reader).

To do this, you’ll need the hg-git plugin for Mercurial, by Scott Chacon and others. The technique is outlined on hg-git’s read-me, but it still took me some effort to get working right, so I thought this blog post might be helpful.

I assume you already have Mercurial installed, a project in a Mercurial repository you want to push, and a GitHub account. Here’s how you do it.

Install hggit

If you’re using MacPorts and Python 2.6, you should be able to do this with:

sudo port install py26-hggit

At the end of hggit’s install process, hggit prints out a line to add to your ~/.hgrc. For MacPorts & Python 2.6, this will look like:

hggit=/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/hggit

Note: Installing hggit in a virtualenv is probably not a good idea unless you also are using a version of Mercurial installed in the same virtualenv. You could probably make it work, but it’s likely more work than it’s worth.

Set up and connect to a GitHub repository

Create an empty repository on GitHub to store your Mercurial project. Copy the git URL from the new repository. Append git+ssh:// to the beginning, and change the : to a /. For example, this:

git@github.com:mercurialpoweruser/mercurialrocks.git

becomes this:

git+ssh://git@github.com/mercurialpoweruser/mercurialrocks.git

This URL then goes in the [paths] section of your Mercurial repository’s .hg/hgrc. If you want to push and pull from GitHub by default, it will look like this:

[paths]
default = git+ssh://git@github.com/mercurialpoweruser/mercurialrocks.git

Or, if you already have a remote Mercurial repository that you want to keep using, you can add the GitHub repository with a different name:

[paths]
default = ssh://hg@bitbucket.org/mercurialpoweruser/mercurialrocks
github = git+ssh://git@github.com/mercurialpoweruser/mercurialrocks.git

Next, push your repository to GitHub for the first time, with hg push (if GitHub is your default) or hg push github (if GitHub isn’t the default).

Now, go to GitHub and reload the page, and revel in the feeling that you are living in a utopian future where cars fly, cold fusion supplies our electricity, and different DVCSs interoperate with each other.

Gotchas

If you find that Mercurial mysteriously stops pushing new changes to GitHub, even though there are new changes to push and it’s pushing them to Mercurial servers fine, you might want to take a look your bookmarks. Do this with hg bookmarks. Make sure that the changeset ID for master is the changeset ID of tip: hg tip. If they’re not the same, update the master bookmark with hg bookmark -f master. This happened to me when I used an older version of Mercurial (1.3.1) without the bookmarks extension enabled. Don’t do that.

Your mileage may vary

I haven’t used this technique on repositories that have lots of branches, file renames, merge conflicts, or other “fancy” distributed version control features. I wouldn’t advise switching your entire development team and mission-critical repositories over to this technique without some more extensive testing.