Tag Archives: programming

Geospatial Queries in MongoDB

Here’s a little gist that I wrote to illustrate the difference between MongoDB‘s use of degrees vs. radians in its non-spherical and spherical geospatial queries:

Tasty new virtualenv-burrito

I’ve already tweeted about this, but for all you Python programmers who have gotten indigestion from the complex process of installing virtualenv and virtualenvwrapper, check out brainsik’s virtualenv-burrito. It wraps up both tools and installs them for you in a single step, no thinking required.

Git and Mercurial branching

Those who followed Is Git really better than X will enjoy this interesting article by Armin Ronacher on the finer distinctions between Git and Mercurial’s branching. (via brainsik)

Is Git really better than X?

Most of Scott Chacon’s points in Why Git is Better than X are spot-on. The page could even be renamed “Why distributed version control systems are better than Subversion and Perforce,” since those two are the clear losers. (And yes, Bazaar is so slow I think it deserves to be listed twice in the speed section.)

I’ve used each of Git, Mercurial, and Bazaar for several months in medium sized teams; I’ve done quite a bit of branching and merging in Mercurial and Bazaar, and a fair amount in Git. Based on that experience, I am compelled to disagree here with a few of the points, specifically with regards to Mercurial.

Cheap local branching

Chacon argues that only Git offers cheap local branching, but Mercurial allows exactly the same work-flows that he outlines, and they are just as easy. Chacon claims that Git’s branching model is different:

Git will allow you to have multiple local branches that can be entirely independent of each other…

This isn’t quite correct. The real difference between Git’s branching model and the others is that Git does not assume a one-to-one relationship between a logical branch and a directory on the file-system. In Git, you can have a single directory on the file-system and switch between different branches with the git checkout command without changing directories.

Mercurial (and Bazaar) enforce default to¹ a one-to-one relationship between a logical branch and directory on the file-system. You can, however, use the hg clone command to make a new, “entirely independent,” branch. You can clone a local directory, a remote directory over SSH, or a remote Mercurial repository. Cloning a repository that’s on the local file-system is the way to create cheap local branches in Mercurial. Mercurial will even create hard links to save disk space and time.

For example, in Git, you make a cheap local branch with:

git branch featurebranch git checkout featurebranch

And in Mercurial, you make a cheap local branch with:

cd .. hg clone master featurebranch cd featurebranch

In Git, you switch between branches with git checkout branchname. In Mercurial, since branches are in a one-to-one correspondence with directories, you switch between branches with cd ../branchname.

In Git, you pull changes from a local branch with git merge branchname. In Mercurial, you pull changes into from a local branch with hg pull -u ../branchname.

Chacon’s four example work-flows using cheap local branches are not only completely possible in Mercurial, they are just as trivial as they are in Git. (The first three are even equally trivial in Bazaar; I’m not sure about the last one in Bazaar, though.)

Chacon says:

You can find ways to do some of this with other systems, but the work involved is much more difficult and error-prone.

Reasonable people might also disagree about the intuitiveness of the specific commands, but creating, updating, merging, and deleting them in Mercurial are one or two commands in the shell, just like in Git. None of these tasks are more difficult or more error-prone in Mercurial.

Chacon also says:

when you push to a remote repository, you do not have to push all of your branches. You can only share one of your branches and not all of them.

This is a little misleading. In Mercurial, since there’s usually a one-to-one correspondence between branches and local directories, and since you can’t be in two directories at once, you’re usually only pushing a single branch. In Mercurial, you generally don’t have to think about which branch you’re pushing, or whether you’re pushing other branches that shouldn’t be pushed, because you generally only ever push one branch at a time. I would actually turn this argument around, and claim that Git’s default of keeping multiple logical branches in a single directory forces you to worry about which branches you’re pushing, and actually makes cheap local branches harder and more error-prone in Git than in Mercurial.

Reasonable people might disagree about the intuitiveness of a branching model which does or does not assume default to a one-to-one relationship between logical branches and local directories, but that’s not why cheap local branches are so powerful in Git. Rather, it’s the ability to clone and merge between repositories locally that allows cheap local branches, and it’s just as easy in Mercurial as in Git, and nearly as easy in Bazaar.

GitHub

GitHub is unarguably the biggest community around any distributed version control system. But Chacon goes too far when he says:

This type of community is simply not available with any of the other SCMs.

BitBucket may not be as big as GitHub, but it’s a “socially-targeted” community where Mercurial users can “fork and contribute” to other Mercurial projects. It might not have as many projects or contributors as GitHub, but a sheer difference in size doesn’t translate to “simply not available.”

Bazaar’s not listed as inferior to Git on this point, I assume because of Launchpad. If Launchpad counts, why not BitBucket?

I’d also be interested to see usage numbers comparing GitHub and everyone’s favorite open-source community brontosaurus, SourceForge. Has GitHub surpassed them in projects or users or commits or downloads?

Minor haggling points

I also disagree with Chacon about the benefits of the staging area or index, but that’s purely a matter of personal preference. I never wanted anything like the staging area before I started using Git, and I don’t miss it now when I’m using Mercurial. If anything, the extra concept of a staging area makes Git slightly harder to learn; newbies coming from any other version control system have to be taught about the -a option to git commit right away, or else they wonder why their changes aren’t getting committed.

In the “Easy to Learn” section, Chacon highlights the add commands in both Mercurial and Git, indicating that they are the same; but they are pretty different. In Mercurial, add schedules previously un-tracked files to be tracked. In Git, add adds changes in a current file to the staging area, or index, including, but not limited to, previously un-tracked files. That section is also missing Mercurial’s mv command. Other potentially confusing differences include Git’s revert, which corresponds to Mercurial’s backout; and Mercurial’s revert, which can be duplicated using Git’s checkout.

Last, I’d like to see the ballyhooed Fossil included in this breakdown.

Update: Thanks to David Wolever, who points out that hg branch and hg bookmark can be used to have more than one branch per directory in Mercurial. I didn’t even know about these commands when I first wrote this post. But I think my central point still holds; creating cheap local branches in Mercurial is just as easy as in Git. [↩]

A two-step refactoring tactic

In the middle of a major, hairy refactoring recently, I codified a tactic for refactoring that I’d been using, unconsciously, for years. The tactic is to break down refactoring into two modes, and deliberately switch between them:

Clean-up mode: Clean up and reorganize the code, without changing the code’s external functionality.
Change mode: Make the smallest, simplest self-contained change, making sure the code is functional again when it’s completed.

You start in clean-up mode, and switch back and forth as needed. This tactic is probably unsurprising to experienced programmers, but for those of you not yet as hard-core as Donald Knuth, let me explain why it’s a good idea.

Separating clean-up and changes into discrete modes of working gives you less to think about at any one time. A refactor can be very invasive—code that was closely tied together is now completely decoupled, or vice-versa; functions, objects and modules become obsolete or nonsensical, or split apart or merge in unexpected ways; identifiers become ambiguous or collide. If you’re trying to reorganize everything at the same time you’re juggling old and new code in your head, it’s easy (for anyone) to get lost in the maze.

When you’re consciously in clean-up mode, you can focus just on tasks like making sure variables have unambiguous, correct names, object / module / function / &c. organization and boundaries are sensible, and so on. You’re not changing any functionality, so your tests (you do have tests, right?) still pass, and the application still behaves as it always has (you did test it manually, right?). And you can be liberal in your clean-up; if you end up improving code that isn’t ultimately affected by the refactor, there’s no harm in that.

Once you feel like the code is clean and ready for the changes in functionality, commit your clean-up (you are using version control, right?). I usually use a commit message like “clean-up in preparation for….”

Now switch to change mode, and start making the changes. Quite often, you’ll discover something else that needs to be cleaned up. But you’re in change mode, not clean-up mode. Since you’re only making the smallest, simplest self-contained change, this new clean-up can probably wait until later. But if it can’t wait until later, then roll back your changes (or save a patch, or store your work in the attic, or the stash, or shelve it, or whatever the kids are calling it these days). Complete the new clean-up, commit it, and then go back to change mode and your half-completed changes.

This tactic has some additional benefits that might not be immediately obvious, too:

Clean-up mode forces you to read over all of the code that’s going to be changed and ask, “How is this code going to be affected by the changes I’m planning?” Sometimes you discover unanticipated edge-cases or bugs in your planned changes. Sometimes you realize the whole plan is flawed, and have to go back to the drawing board. And sometimes, after a good, hearty clean-up session, you realize that you can make the required changes in a less invasive, simpler way. I find that most refactors are more clean-up than anything else.
If the codebase has a “master” or “main” branch, and you use a “feature” or “working” branch for your refactor, you can put the clean-up commits into the master branch (they don’t change any functionality, right?) and only commit the functional changes to the feature branch. What’s the point of that? Well, everyone else gets to benefit from your code clean-up right away, and when you do end up merging your functionality changes into the master branch, because it’s a smaller delta, there’s a smaller chance of conflicts.

The next time you’re staring down the barrel of a nasty refactor, and cursing the person who didn’t fully think think through the business requirements, try this. It won’t make every refactor a piece of cake, nor will it become the hot new programming methodology acronym down in the valley (2SR, anyone? 2SRT?), but I guarantee you’ll be glad you tried it.

Thought-provoking JavaScript articles

Prototype.js developer Kangax has some good JavaScript articles on his blog, including one about delete in JavaScript and another about subclassing Arrays.

Another skill to omit from résumés

Ryan W is done building Facebook apps:

Clients don’t care that it was Facebook (not you) who broke the feature that was working yesterday, and they don’t care that what you said you could do two months ago can no longer be done because Facebook decided to change the platform (again).

I built a (very simple) Facebook app for a client back in March, and it left exactly the same bitter taste in my mouth.

Grasping the nuclear fourth rail of Python syntax with both hands and holding on for dear life

In Python vs. Ruby: A Battle to The Death, Gary Bernhardt wishes for Ruby-style blocks in Python.

The BDFL has already weighed in on anonymous blocks in Python:

If you want anonymous blocks, by all means use Ruby. Python has first-class callables,¹ Ruby has anonymous blocks. Pick your favorite, but don’t whine that neither language has both. It ain’t gonna happen.

This seems to imply that first-class callables and anonymous blocks are mutually exclusive language features, but that’s wrong: JavaScript has the ability to pass callables around like anything else, and it has anonymous functions, which can be used just like Ruby’s anonymous blocks. Does that mean JavaScript is better than Python or Ruby? My feelings about Ruby are indifferent with a chance of rain, but I love Python, so I’ve got to ask: are you going to take this lying down, Python?

I’m not sure Python needs full-blown, Ruby-style anonymous blocks. But it might be good enough to be able to use function definitions as r-values, like JavaScript can. (If you’re not already wearing your tinfoil hat, now might be a good time to put it on.)

This would allow asynchronous code to be written in conceptual order (just like in JavaScript):

do_something_asynchronous(param1, param2, (def callback(result):
    do_something_with_result(result)
))

And it would allow encapsulation of lexical scope inside specific iterations of a loop to be used later when an asynchronous call returns (just like in JavaScript):

for item in results:
    (def single_iteration():
        do_something_asynchronous(param1, item, (def callback(result):
            do_something_with_result_and_item(item, result)
        ))
    )(item)

I’ve even had occasion to want that other Python namespace, class, to operate as an r-value:

class MyConverterClass(BaseConverterClass):
    my_converter_type = (class MyConverterType(float):
        def __str__(self): # a custom __str__ method
            return '%0.2f' % self
    )

In these examples I’ve wrapped their inline definitions in Python’s great syntactical equalizer, the parenthesis. It would be even nicer to be able to leave them off, but I’m sure that this syntax would run headlong into Python’s whitespace-based block definitions, and it would be even more of a train-wreck without parentheses.

I’ve also named the defs and classes. It would also be nice to be able to omit the function or class names if they were unneeded (just like in JavaScript). But anonymous functions, a.k.a. lambdas, are the electric third rail of Python syntax, so anonymous classes would be… I dunno, the nuclear fourth rail?

Thanks to Steve for explaining that by “first-class callable,” GvR means functions that can be passed around and assigned to other variables without being executed. He also pointed out that the reason Ruby’s callables aren’t first-class is because optional parentheses in function calls leave no way to refer to a function without calling it, not because of the existence of anonymous blocks. [↩]

Rant about Ruby 1.9’s strings

Rant about strings in Ruby 1.9:

What other language requires you to understand this level of complexity just to work with strings?!

(discussion on HN).

The economics of contributing to open-source projects

This adaptation of Elinor Ostrom‘s work on the emergence of self-governance to open-source projects can explain my decision to stop reporting bugs to Ubuntu. If this formula holds true, then an open-source project will thrive:

benefit of contributing > benefit of not contributing + cost of contributing

In my experience with Ubuntu, this formula does not hold true. The benefit of contributing is often zero, as patches are not accepted and bugs are not fixed, or close to zero, as it can take years for a bug to be fixed. And the benefit of not contributing is similarly zero. And of course, the cost of contributing, in terms of time spent filing bugs, is greater than zero. The cost of contributing is often very high, requiring arguing for the validity of a bug, re-reporting the same bug multiple times, or attempting to recreate a bug from several releases prior.

glyphobet • глыфобет • γλυφοβετ

musings over a tuna fish sandwich