Tag Archives: Python

Eight Python warts

I love Python, but a few things still bug me about it. I’ve bashed on several other technologies; here’s some Python bashing. In no particular order:

Update: This has started a pretty good discussion on Reddit. Many people correctly guessed that I’m using singleton in the mathematical sense, not in the sense of the programming pattern. The comments from Cairnarvon and tghw are particularly worth reading.

HttpOnly cookies in Python & Pylons

Thanks to Jeff Atwood for posting about the benefits of the HttpOnly flag on cookies. Support for HttpOnly cookies has now been added to Python 2.6’s Cookie module, and Paste’s WSGIResponse. Pylons applications can now use the HttpOnly flag to protect cookies, significantly raising the bar against XSS attacks on users of those applications.

Latest versions of Firefox, Opera, and Internet Explorer already support HttpOnly. Now all that’s left is for Apple to fix CFNetwork to support HttpOnly and then WebKit/Safari will be able to support it too.

Ruby’s not ready: comments, corrections, and clarifications

Some good discussion on this one. It’s nice to see Ruby people saying things like this (5th message from the top, from Song Ma):

Interesting. But what I am thinking about is not the attitude of the author, but the points he was trying to make. The deep review and discussion will benefit the language insights.

Or this one (from Trans, on the same forum):

Why is everyone getting so worked up? It’s a critique. Biased it may be, but that in itself does not make it worthless. In fact, it can be very constructive b/c it uncovers “attack points” with the language. With each point we can ask ourselves objectively is this a misconception or a fair point? In either case we have an opportunity, to address misconceptions in our Ruby evangelizing blogs and to work to improve Ruby where a point has merit.

Bias can work both ways. But I think the Ruby community can rise above it, and Ruby will be all the better for it.

And from Peter Cooper at Ruby Inside:

As it is, I think he’s missing the point a lot of the time (he tends to think Python’s better because he likes its conventions more than Ruby’s – not a compelling argument), but it’s an interesting read none the less. Anything that keeps our minds open to the fact that Ruby != perfection is worth a look.

And a comment on the same post:

Let’s take his best points and incorporate them into future versions of Ruby.

Sounds like a plan.

I saw a few counterarguments like this:

Everything he’s saying is well known.

Just because a problem is well known inside a community doesn’t make it any less of a problem.

Everybody who mentioned documentation, even those who disagreed strongly with the rest of my post, agreed that Ruby’s documentation is seriously lacking. In fact, a lot of the mistakes in my original post are due to me not being able to easily find an explanation of something on the various Ruby doc sites. Which leads me to…

Continue reading

Ruby’s not ready

Introduction

A few weeks ago, I learned Ruby and Ruby on Rails to compare them head-to-head against Python and Pylons, in preparation for a new project. When I began, I knew nothing about Ruby or Ruby on Rails. I have tried to be as objective as possible: before beginning this project, I wrote in email on March 5th:

I promise we’ll be as objective as humanly possible; if Ruby and Ruby on Rails truly is better, we’ll happily use RoR and never look back. I want to know that I’m using the absolute best tool for the job.

Since then, I have reimplemented one complex nine-hundred line Python library, PottyMouth, in Ruby. Another team member has also reimplemented parts the Pylons web application Spydentify in Ruby on Rails.

The best tool for the job is Python & Pylons. While Rails and Pylons are similar, shortcomings in Ruby compared to Python make Python & Pylons the clear choice. I make three basic arguments against using Ruby:

  • The language and its implementation are incomplete and immature. Immature implementations breed performance issues. A project loses time when it must implement missing or incomplete functionality.
  • The language is inconsistent and needlessly complex. Inconsistency and complexity confuses people and confusion breeds bugs.
  • The documentation is incomplete. Incomplete documentation breeds bugs as you might misuse a feature. And a project slows down while you read the language or library source code, or ask the community for help with undocumented features.

I believe Ruby would fare poorly against other languages, not just Python, on these angles as well.

Why have Ruby and Ruby on Rails gained so much traction, despite these issues? Aside from the Rails hype, it’s because they are not insurmountable issues. It is possible to build a large application in Ruby; many people have. But any programmer building a large application in Ruby will have to deal with the issues listed here at some point. These are all issues that do not appear right away. A project doesn’t face them until a website reaches maturity, develops lots of features, fields traffic from lots of users, or until a project hires programmers who aren’t Ruby experts or experienced enough to anticipate these issues.

My point is simply that Python (and other languages), allow you to handle most of these issues more elegantly, or avoid them completely.

Subscribe here if you’d like to be notified of any follow-up posts (for an article this long, I’m sure there will be a few), or if you’d like to read my critiques, positive and negative, of other things technological.

Contents

  1. Unicode and encodings
  2. Regular Expressions
  3. Documentation
  4. Migration to Ruby 1.9/2.0
  5. Performance
  6. Scoping
    1. One nice thing about Ruby’s scoping
  7. There’s more than one way to do it
    1. String conversion
    2. print, p and puts
    3. Ranges and slices
    4. require and load
    5. Raising exceptions, throwing strings
    6. do and then are extraneous
    7. length and size, update and merge
  8. Object model
  9. Faking keyword arguments
  10. Libraries
    1. SAP support
    2. DateTime support
  11. Debugging
  12. Rails & Pylons
  13. Cool things about Ruby
  14. Conclusion
  15. Further reading

Unicode and encodings

If you’re already familiar with Ruby’s problems with Unicode, feel free to skip this section. Ruby did not have any support for Unicode character strings when it was originally released in 1996. This is only slightly silly for a language that was invented after Unicode 1.0 was released in 1992. It is inexcusably shortsighted that Ruby has not added Unicode objects over the last twelve years.

A third-party Ruby library for conversion ties into the Unix iconv program, allowing conversion between two different encodings. However, converted strings are still sequences of bytes. This means that using most of the string methods (slice, reverse, size, index, downcase, upcase, strip) and indexing into the string with [] notation do not work in non-ASCII encoded strings. You can get the desired results out of these methods by first accessing the .chars attribute of non-ASCII strings. This is less desirable because the programmer must remember to use .chars whenever he or she is working with non-ASCII strings.

A better solution would be to support first-class Unicode objects, as strings of Unicode characters, natively in the language.

There is a third-party Unicode support library that replaces Ruby’s String class and adds Unicode support, but it is acknowledged to be hackish, potentially dangerous, and makes Ruby somewhat slower.

Unicode support may or may not be forthcoming in Ruby 2.0. There are certainly members of the community advocating it.

This means, among other things, that there is no built-in support in Rails’ HTML generation1 , for converting Unicode characters to HTML entities. This page details how to hack around this problem; but this is something that should be automatic and built-in, not hacked around.

Python’s built-in Unicode and encodings support, which is a first-class, native Unicode object and a full suite of built-in encodings, was introduced in Python 1.6 in 2000. It has evolved into an extremely reliable, secure and versatile Unicode implementation. It is also extremely simple to use.

Python supports all of the encodings that Ruby supports via iconv, and a number that it doesn’t, including Quoted-Printable, the encoding used for the vast majority of email messages, and MBCS, the encoding used by Windows FAT32 and NTFS file-systems.

Because Python’s built-in Unicode support is so robust, the vast majority of Python libraries all convert to Unicode when accepting input, and convert to the proper encoding when producing output. Multi-language support and correct encoding handling is usually a non-issue when building a Python application. For example, non-ASCII input to, and output from, a Pylons web application Just Works™.

Regular Expressions

For a language that borrows so heavily from Perl, the regular expression support in Ruby is pretty disappointing. Regular expressions might not seem like a very important part of a language, but it’s an interesting litmus test because Python, Perl, and JavaScript all support essentially the same regular expression syntax. Ruby’s regular expressions, however, were so broken that I switched to Ruby 1.9 to finish porting PottyMouth.

Ruby’s Regexp::MULTILINE flag doesn’t behave the way multiline does in other languages. In other languages, the multiline flag is off by default, and when enabled, it considers . to include newlines and ^ and $ to match right after, and right before, every newline:

In Perl:

if ( "foonbarnbaz" =~ /^bar/m ) { print "yesn"; } else { print "non";}
yes
if ( "foonbarnbaz" =~ /^bar/ ) { print "yesn"; } else { print "non"; }
no

In Python:

>>> import re
# This matches
>>> re.search('^baz', "foonbarnbaz", re.MULTILINE)
<_sre.SRE_Match object at 0xb7c4bf38>

# This does not match
>>> re.search('^baz', "foonbarnbaz")

However, in Ruby, the Regexp::MULTILINE flag appears to only affect the interpretation of ., not ^ and $, making it more like Python’s re.DOTALL or Perl’s /s switch.

in Ruby:

irb(main):001:0> /^baz/.match("foonbarnbaz")
=> #<MatchData:0xb7cd42b0>
irb(main):002:0> /^baz/m.match("foonbarnbaz")
=> #<MatchData:0xb7cdb740>

irb(main):003:0> Regexp.new('^baz').match("foonbarnbaz")
=> #<MatchData:0xb7cc7df8>
irb(main):004:0> Regexp.new('^baz', Regexp::MULTILINE).match("foonbarnbaz")
=> #<MatchData:0xb7cb59dc>

There is no documentation whatsoever of the actual semantics of Regexp::MULTILINE, so it’s not clear whether this is an accident, a bug, or an intentional departure from the standard. Either way, it makes the language more difficult to learn and less predictable to use.

There’s also no documentation whatsoever of the actual semantics of Regexp::EXTENDED. The eregex.rb file in the Ruby source just adds support for & and | logical operators, and the only documentation is the message “This is just a proof of concept toy.” As best I can tell, regular expressions in Ruby always behave like extended regular expressions, supporting ?, +, | and N, regardless of whether you use the extended flag or not. What does the extended flag actually do? I don’t know.

Ruby’s Regular expressions also match only ASCII and a small set of encodings, including UTF-8 and the Japanese encodings EUC and SJIS. Want to write a regular expression that matches UTF-16, Latin-1, or raw Unicode? You’ll have to use the third-party Oniguruma package or a different programming language. You can’t use pure Ruby.

Lastly, positive and negative look-behind aren’t supported in Ruby 1.8. I only noticed this because the code I was porting used negative look-behind expressions. Ruby 1.9 adds look-behind. The options for Ruby 1.8 users are to install 1.9 or the third-party Oniguruma package, which also supports many more encodings (but still not raw Unicode).

Both Python and Perl support positive and negative look-behind and Unicode regular expressions natively.

In general, it’s a bad sign when a third-party reimplements a large chunk of functionality in an existing piece of software. It means that the existing functionality was just plain not good enough. And, for open source projects, it means that the existing project was unable, or unwilling, to solve the problem, or let others contribute patches to solve the problem, within the project. The fact that this happened for both Ruby’s encoding and regular expression support is disturbing.

Documentation

The Standard Library Documentation for Ruby is woefully incomplete. For example:

  • There is no documentation whatsoever for:
    • the digest library, which contains the SHA1 and MD5 check-sum tools. These tools are critical for generating secure cookies and storing user passwords securely. Without documentation, you have to go read the Ruby source code to know that your application is secure.
    • Racc, a LALR(1) parser generator for Ruby
  • The documentation for gdbm only includes a list of constants it defines. No methods, descriptions, or anything else.
  • The documentation for the syslog module is useless. It lists one method, close. A useful syslog library would have to have at least open and write functionality.
  • The link to the tcltklib module documentation returns an error page.
  • The Profiler documentation is extremely limited. It looks like it would be possible to use the profiler, but there’s no information about how it works, which is critical when you are profiling an application.
  • As noted above, the regular expression documentation doesn’t cover MULTILINE or EXTENDED.

In general, the majority of modules listed have no description page. None of the pages specifically state which version of Ruby they were written for.

Ruby’s development and documentation writing appear to be two disconnected endeavors, and the documentation is acknowledged to be incomplete. In fact, there is no single rally point for Ruby material. A visit to the the official Ruby documentation page lists a variety of documentation, tutorials, examples, etc. spread across many different websites with varying levels of completeness and relevancy to the current version of Ruby. There is no single tutorial or language overview which is complete for the current version of Ruby 1.8.x (over four years old).

This doesn’t inspire confidence. Are there any libraries that aren’t listed at all? And how many of the existing libraries have documentation that is incomplete, out-of-date, or incorrect?

By contrast, Python’s standard library documentation is complete, versioned and dated.

Migration to Ruby 1.9/2.0

It’s not clear what’s happening with regard to Ruby 1.9 and/or 2.0. Ruby 1.9 has been under development since (at least) 2006. (It may have been under development longer than that; the lack of any official documentation about it makes it hard to know for sure. This podcast claims it’s been around longer than Perl 6, which would make Ruby 2.0 almost as old as Ruby itself.) An experimental/development version, 1.9.0, was released in December 2007. I tested against the version of Ruby 1.9 in Ubuntu 7.10: 1.9.0+20070830-2ubuntu1.

Quite a few things that are allegedly new in Ruby 1.9 actually exist in Ruby 1.8, and it’s not clear whether they’ve been back-ported or whether their behavior has only subtly changed.

For example return value unpacking, % string formatting, and newlines inside the ternary operator are supposedly new in 1.9, but work exactly the same in 1.8. Other things that are supposed to be introduced in Ruby 1.9, like multiple splats, don’t (yet) work at all in 1.9.

Other improvements in Ruby 1.9 include literal hash syntax, block-local variables, and, as already noted, better encoding and regular expression support.

Some documents indicate that Ruby 2.0 is going to be different in ways that will break existing Ruby 1.x programs severely. They also contain disturbing statements like this one about the new garbage collector: “It will be (mostly) thread safe.” Being (mostly) thread safe is like being mostly pregnant. You either are, or you aren’t.

There are two explanations for this lack of clear plan for Ruby 2.0. Either Ruby 2.0 is so far off that no such document would be useful yet, or nobody in the Ruby community has thought about these issues yet. Both of these would be bad signs. Either way, it’s a total mystery how difficult it will be to move to Ruby 2.0, or when that move might have to happen.

Python, on the other hand, has been in the 2.x series for a long time. Planning for Python 2.0 began while Python 1.5 was the current version. In September 2000, as Python 1.6 was released, there was a complete outline available of what to expect from Python 2.0. Python 2.0 was released in October, 2000. Programs written eight years ago for Python 2.0 will still run, unmodified under Python 2.5. Many Python 1.x programs will also run under 2.5.

Python 2.6 and Python 3.0 are slated for release this summer. The Python 3.0 process has been going on for about a year. There is a clear outline of exactly what’s changing between 2 and 3, and guidelines for how to write Python code that will run equally well under 2.6 and 3.0. The Python developers are also providing a conversion program that will automatically translate between 2.6 and 3.0 code, and warn programmers about code it was not able to translate.

Python proves that a programming language can evolve safely, easily, and largely free of hassles. Future versions should not be a potential wild-card (or worse, a complete clusterfuck, as with PHP).

For a piece of software that’s going to be the core of your business for as long as you are in business — hopefully many, many years — why choose anything other than a language with a migration process like Python’s?

Performance

It’s difficult to precisely evaluate the difference in execution time between different languages. However, The Computer Language Benchmark Game gives a pretty strong indication that Ruby is slow. On its tests, Python is 3×-4× as fast as Ruby. Ruby is slower than TCL, a language that is twenty years old. Ruby is about the same speed as JavaScript (in Mozilla’s SpiderMonkey interpreter). The only thing slower than Ruby is Prolog.

The notorious Rails is a Ghetto article outlines performance problems in Ruby and Rails that, disturbingly, went unaddressed for long periods of time. In the worst one, the author reported serious performance issues to the Rails community, which largely ignored the problem or denied its existence. Meanwhile, the problem had been identified and patched by someone else, but the Ruby core developers ignored the patch for a year.

In another incident in the same article, the original Rails author admits that the original Rails code required about four hundred restarts a day, or six to seven restarts per thread per day. Four hundred restarts a day means four-hundred chances for a database transaction to fail, four hundred chances for a verification email to be sent by the system without the corresponding data being stored in the database, four hundred chances for the user’s browser to not receive all the data it needs to correctly render a page or display data.

Even for a project for which performance is not the primary concern, these trends should be cause for concern. Serious performance issues mean buying more RAM, and upgrading servers sooner.

Scoping

Ruby’s scoping rules are complex:

  1. Files, modules, classes, defs and blocks create new scopes.
  2. Local variables have no sigil and begin with a lowercase letter. They are available only in the scope they are defined in.
  3. “Constants” have no sigil and begin with an uppercase letter. They are available in the scope they are defined in and in all enclosed scopes.
    1. “Constants” are not constant; they can be reassigned whenever you like, just like everything else.
  4. Globals begin with the $ sigil and are global.
  5. Instance attributes begin with the @ sigil and are, by default, protected, or available only inside the class.
    1. Instance attributes can be made available outside the class with attr_accessor or attr_reader.
  6. Class attributes begin with the @@ sigil and are available only inside the class.
    1. Unlike instance attributes, class attributes cannot be accessed outside the class with attr_accessor or friends.
  7. Methods are, by default, public.
  8. Methods can be made private or protected with the private or protected keywords.
    1. protected doesn’t mean what you think it means. Both private and protected methods are available within the class and within all containing subclasses.

The terminology used to refer to non-constant “constants” is extremely unfortunate.

Why isn’t the full range of public/protected/private scopes available to attributes as well as methods? Why is a totally different convention used to scope attributes? Why can’t class attributes be accessed outside the class like instance attributes?

What is the point of the subtle, weird difference between protected and private? What problem does it solve? Why don’t protected and private work the way they do in Java and PHP?

Clear, consistent, simple scoping makes it easy to keep track of what variables are available where. Complex scoping rules mean there’s more to remember, there are more mistakes to make and more ways to get confused. Mistakes and confusion cause bugs.

Python uses just a naming convention to convey whether a variable should be thought of as private or protected. Both Python and Ruby can be monkeypatched to modify private and protected attributes or methods, so it’s best to think of private and protected as purely advisory in either language. Experienced Pythonistas learn that someobj.__private__ is a red flag; the fact that you must always monkeypatch to do this in Ruby might provide an additional disincentive to doing it, but it also makes it easier to do it on accident.

One nice thing about Ruby’s scoping

There’s one place where Ruby’s scope behavior is better than Python’s. Default argument values in function definitions are (re)evaluated each time a method is called in Ruby. In Python, default argument values get evaluated in the containing scope when the function is defined. This can get you into trouble in Python, if a default value is a mutable type. If you’re modifying the value, it’ll persist across subsequent calls to foo:

def foo(arg=[]):

In Python, you end up having to do this:

def foo(arg=None):
if foo is None:
foo=[]
# some code

This is definitely less clear than in Ruby, where you can simply say what you mean:

def foo(arg=[])
# some code
end

There’s more than one way to do it

We can thank Larry Wall and Perl for There’s more than one way to do it. TMTOWTDI is bad, because to really know a language, you must know each of several ways to do similar, but different things, and each synonym. If there is only one way to do it, you only have to remember that one way, instead of many. Programmers spend more time reading code than writing it, and often, they’re reading other people’s code, so they can’t get away with remembering only their favorite way to do it. The more you have to remember, the more likely you are to forget, make a mistake, or have to stop to check the documentation. Mistakes breed bugs, and checking the documentation takes time. While not nearly as bad as Perl on this front, Ruby commits some serious TMTOWTDI.

String conversion

Some Ruby objects have an extra stringification method, .to_str, as well as the standard .to_s. .to_s is an explicit cast, used whenever you need a string representation of an object. .to_str is an implicit cast, which gets called when you are using a string-like object in a context that requires a string. (This illustrates a philosophical difference between Python on the one hand and Ruby and Perl on the other; Python never does context-sensitive implicit conversion.)

The naming of these methods is atrocious — they are radically semantically different, yet the name of one is an abbreviation of the name of another. What happens if you write code that critically relies on this distinction, go work in another language for six months, and then get called in to fix a critical production bug in that code? Would you remember which is which, and what the difference was, exactly? I wouldn’t. And the presence of both has confused people other than me. .to_str should be named something like .stringcontext.

What is the use case for to_s‘s concatenation of arrays and hashes? It just runs keys, values, and items together in a string, making it impossible to tell whether it was a number, a string, a hash or an array that you just stringified:


irb(main):001:0> h = {1=>2}
=> {1=>2}
irb(main):002:0> a = [1,2]
=> [1, 2]
irb(main):003:0> h.to_s
=> "12"
irb(main):004:0> a.to_s
=> "12"
irb(main):005:0> a.to_s == h.to_s
=> true

When is this useful? It’s not human-readable, and it’s not computer-readable. It’s just mangled garbage. It’s even worse when you call to_s on more complex data structures:


irb(main):001:0> c = [1, :two, {:foo=>['bar', 'baz'], :key=>'val'}, [7,8], 9]
=> [1, :two, {:key=>"val", :foo=>["bar", "baz"]}, [7, 8], 9]
irb(main):002:0> c.to_s
=> "1twokeyvalfoobarbaz789"

As a side note, Python has two stringification methods as well, str() and repr(). str() provides a string representation, and repr() provides a string that can be eval()-ed or pasted into a Python interpreter. Ruby appears to have no equivalent of repr(), aside from p, which leads to the next topic….

print, p and puts

What is the difference between print, p, and puts? This isn’t documented. p is a sort of poor-man’s repr(), printing each argument in a form that could be pasted into Ruby source code on a separate line. Strangely, there’s no way to capture the output of p and store that string for later. print prints each of its arguments without any space between them. puts prints each of its arguments, or each item in each collection argument, on a separate line. Why does Ruby need all three? Are you going to be able to remember which one behaves each way, and use the right one at the right time?

Not only does Python get away with one, but the behavior of print in Python is exactly what I’ve wanted out of print, or printf, in every programming languge I’ve ever used — print the str()-ification of every argument, separated by spaces, with a newline at the end. If you want all the arguments concatenated, or on separate lines, you can join on empty string, or "n". If you don’t want a trailing newline, use a trailing comma.

The behavior of puts is even weirder; it seems to stop descending into collections at some point. Note that the hash item is just stringified, but the items in the array inside the array are printed on separate lines:


irb(main):001:0> c = [1, :two, {:foo=>['bar', 'baz'], :key=>'val'}, [7,8], 9]
=> [1, :two, {:key=>"val", :foo=>["bar", "baz"]}, [7, 8], 9]
irb(main):002:0> puts c
1
two
keyvalfoobarbaz
7
8
9
=> nil
Ranges and slices

Ruby has two range operators, .. and .... Why? The only difference is that the shorter one, with two periods, returns a longer range, and the longer one returns a shorter range, not including the endpoint. Can this possibly get any more confusing?2 The language should have one, not two. It is not hard to add or subtract one when you want a range with, or without, its endpoints.

Python’s range() and xrange() built-in, and its slice syntax, are less confusing, as they never include the endpoint. They are also more powerful, because they allow a third “step” argument. Want every other, or every third, element in a list, or a range that steps by 2 or 3? Try l[::2] or l[::3], range(start, stop, 2) or range(start, stop, 3). Want the list in reverse? [::-1] Want a range in reverse? range(stop, start, -1). Python’s syntax is simpler, easier to remember, and more powerful to boot.

Ruby’s range operators are also used to see if a value is in a particular range, like this:

irb(main):001:0> (0..2**1000) === 2**999
=> true

This has a disturbingly clever ring to it. The range operator doesn’t actually walk through every element in the 2**1000 element range and compare it to 2**999 — if it did, this code wouldn’t execute instantaneously. It’s doing something like this underneath: 2**999 >= 0 and 2**999 < 2**1000. The only reason to use a range operator like this, when it’s about as much typing to just say what you mean directly, is when you have a range that you’re passing around as a variable.

In Python, the corresponding idiom is 0 <= 2**999 < 2**1000, but the ternary comparison syntax doesn’t work in Ruby, so you have to write 0 <= 2*999 and 2*999 < 2**1000 in Ruby. Python’s xrange() can also be passed around like a variable, but you test for membership with value in myrange instead of ===.

Now imagine you’re someone who hasn’t seen Ruby before, or who has been working in some other language for months, who is now tasked with fixing a critical bug which relies on this strange, non-obvious idiom. Are you going to know, or remember, that === combined with .. has special semantics? Compare that to how difficult of a time you will have understanding lower < value < upper, or value in myrange, in Python code. Simplicity and straightforward syntax has a significant long-term benefit.

require and load

Ruby has two ways to handle code in other files require and load. The difference is that require loads the code only once per application, and load loads it each time the interpreter sees load. Yet again, there’s more to remember to be fluent in Ruby. This distinction is of dubious value; if you have code that you want to run more than once, put it in a method and call the method. Don’t make the interpreter re-load, re-parse, and re-run the file.

And there’s more. The Ruby interpreter requires/loads the file corresponding to the path specified by a string. Unlike Python, Ruby has no concept of set of paths to search for modules by name, so you see recipes like this to establish a file’s location and find modules installed on a system.

And because Ruby loads strings as paths instead of modules by name, you can trick the interpreter into accidentally requiring a file twice. Oops! Python provides __import__ if you need to reload a module, but by default, it only loads modules once per application.

And finally, since require and load just pull the contents of another file into your local namespace, there’s no simple way to pull in just a single class or variable from a module. And there’s no way to ensure that classes in the file you’re importing don’t clobber classes in the file you’re importing it into. Want to attack lots of Ruby applications? Just write a helpful library with obfuscated code that overrides a common class, in something like HTTP or cookie authentication code, and adds a back door.

Raising exceptions, throwing symbols

Exception handling in Ruby is handled with raise/begin/rescue. Python uses raise/try/except, and Java & JavaScript use throw/try/catch to perform essentially the same exception handling. But Ruby also has throw/catch, which is unrelated to exception handling. It is normally used as a way to achieve labeled break.

Now, labeled break is a feature that I’d very much like to see in Python, but this feature in Ruby is essentially goto — and it’s even more powerful than goto in C, since it is not confined to single functions. Rather than debating the merits of goto, I’ll just ask this: does Ruby have to use terms that are commonly associated with exception handling, for a feature that is totally unrelated to exception handling?

do and then are extraneous

Ruby’s while and if statements can optionally have do and then keywords following them:

while condition do
# some code
end
if condition then
# some code
end

This is just one more extra variation that Ruby programmers have to remember to be able to read other people’s code.

length and size, update and merge

What is the difference between the length and size methods on String, Array, and Hash? There is none. Hashes have update and merge methods. What’s the difference? None.

These are particularly atrocious synonyms, because the English words they are based on aren’t synonymous. What if you have a class representing a geometric object, and you want length and size to return different measurements? What if you have a class representing a wiki page or source code repository, and you want update and merge to perform radically different operations? When someone else is reading your code, and they’ve been trained that these two methods are synonymous in Ruby, and they might forget that the methods aren’t synonymous in this particular code.

Object model

Ruby doesn’t require self to be explicitly passed in to methods. Python has explicit self, and for good reason.

Rather than using self to get at class and instance attributes, Ruby uses @ and @@. You can get at self, to pass it to a method in another object, by calling self. And you can get at the superclass’s method of the same name by calling super. Arguably, if you need to get at a different method on the superclass, rather than that different method on self, then your object’s inheritance is broken. This is different from Python, but still fine.

You can delegate to another method on that class by simply calling that method. And here’s the problem with Ruby’s object model: because you don’t need to use @ to access methods, it’s too easy to accidentally shadow a method with a local variable.

There’s at least one case that requires self as an explicit reciever: when calling an attribute writer. Otherwise you’re just shadowing the attribute writer method locally. It’s not clear that there might not be other rare cases that require self as an explicit reciever too. This seems dangerous; in Python, self is always required. In Ruby, you almost always don’t need self, except in the rare case where you do. This feels like an accident, or an overly clever solution. Clever solutions make me suspicious, and inconsistency breeds bugs. Simple solutions, like Python’s strict reliance on explicit self, make me confident I’m writing reliable code.

Faking keyword arguments

Ruby doesn’t support keyword arguments. The commmon idiom to “fake” keyword arguments lacks the expressiveness and versatility of Python.

In Python, you can have a function definition like this:

def HTMLTag(tagname, parent=None, *children, **attributes):

And you can call this function in many different ways:

HTMLTag("br")
HTMLTag("div", parent=bodytag)
HTMLTag("div", bodytag, p1, p2, p3, width="100%")
HTMLTag("a", p1, href="http://google.com", *["google"])
HTMLTag("a", p1, "google", href="http://google.com",)
HTMLTag("hr", width=77, parent=div, height=4, color="#000")
HTMLTag("hr", **{'parent':1, 'width':77, 'height':4, 'class':'ruler'})

Keyword arguments with default values may not seem like a very critical feature to be missing. But it’s one of the most powerful idioms in Python, because there are a lot of cases where arguments act like configuration, modifying a function’s behavior. If you can leave these modifiers off in the common case, code is faster to write and easier to read; you don’t have to remember the common modifier values; and you’re less likely to use the wrong modifier.

Ruby does support the * expansion and collection of Arrays, similar to Python. And it does support default values for optional arguments:

def tallandskinny(height=100, width=1)
print height, " tall ", width, " wide"
end

tallandskinny()
# prints "100 tall 1 wide"
tallandskinny(1)
# prints "1 tall 1 wide"
tallandskinny(1, 100)
# prints "1 tall 100 wide"

But the optional arguments can’t be passed in as key-words in a different order. Ruby collects any key-value pairs in an argument list into a Hash, but that hash takes the place of a single argument position; it has nothing to do with the parameter names in the method definition. Here, the hash gets assigned to height and then stringified:

tallandskinny(:width=>100, :height=>1)
# prints "width100height1 tall 1 wide"

To duplicate Python’s keyword argument behavior, you have to write something significantly more complicated:

def tallandskinny(kwargs={})
defaults = {:width=>1, :height=>100}

kwargs = defaults.update(kwargs)
print kwargs[:height], " tall ", kwargs[:width], " wide"
end

People on the #ruby-lang IRC channel were quick to point me to snippets of code like this and say “Ruby can fake Python-style keyword arguments easily.” And they’re right, you can fake it. But users (programmers, in this case) shouldn’t have to resort to tricks to get a piece of software (a programming language, in this case) to work the way they want it to work. If users are doing this, it means the software has failed to provide the features its users need.

It looks like keyword arguments are at least under consideration for a future version of Ruby.

Libraries

SAP support

SAP support is critical to the application that I’ll be working on. Ruby’s SAP support is alpha, version 0.06, and hasn’t been updated in over a year. Python’s SAP support is 1.0, and has been around for four years and there is documentation written by a SAP developer.

DateTime support

Ruby supports Date and DateTime objects natively, but there’s no duration or timedelta support built-in. There’s only a third party Duration library written by the Rails people, no doubt to support the SQL duration type. It’s unacceptable that a duration/timedelta isn’t built in. Why not? Because if it were built-in, subtracting two dates could return a timedelta, instead of a Rational, as it does in Ruby:

require 'date'
irb(main):002:0> puts Date.new(2008, 03, 29) - DateTime.new(2008, 3, 28, 22, 8 )
7/90

It’s not helpful to know that the time delta between 2008-3-28 22:08 and 2008-3-29 is 7/90ths (of a day). What would be helpful is to know that it’s 1:52:00, like in Python:

>>> from datetime import datetime
>>> dur = datetime(2008, 3, 29) - datetime(2008, 3, 28, 22, 8 )
>>> print dur, type(dur)
1:52:00 <type 'datetime.timedelta'>

Debugging

Ruby tracebacks don’t print the line of code on which the error occurred. Compare these two tracebacks, each in programs that divide by zero three function calls deep:

hack.rb:10:in `/': divided by 0 (ZeroDivisionError)
from hack.rb:10:in `baz'
from hack.rb:6:in `bar'
from hack.rb:2:in `foo'
from hack.rb:13
Traceback (most recent call last):
File "hack.py", line 10, in <module>
foo(0)
File "hack.py", line 2, in foo
bar(arg)
File "hack.py", line 5, in bar
baz(arg)
File "hack.py", line 8, in baz
8/arg
ZeroDivisionError: integer division or modulo by zero

Often you can see exactly what’s going wrong just from a Python traceback, because you can see the line of code that was a problem. Debugging Ruby is slower and more difficult because it doesn’t provide this information.

By the way, Perl’s even worse than Ruby at providing useful information when there’s an error:

Illegal division by zero at hack.pl line 10.

Rails & Pylons

The Ruby on Rails and Pylons web frameworks are more or less comparable. A good chunk of Rails’ core has been ported to the Python webhelpers package, which is used by Pylons (and other Python web frameworks). There doesn’t seem to be any major features in one web framework and not the other. Pylons has in-browser debugging (off by default in production code) and, since it relies on existing, and pluggable, templating, ORM, and other modules, may be slightly more mature and flexible. Rails’ DB migration is more mature than SQLAlchemy’s.

Cool things about Ruby

Ruby’s block arguments have interesting potential, especially if you were writing a heavily thread-based or event-based application. Of course, a traditional MVC web application doesn’t really need threads or events (unless you’re writing a HTTP server too). Most places where I’ve used, or seen examples of, block arguments in Ruby are places where I would have used a list comprehension in Python. In other words, block arguments are far more powerful than their common use case.

Metaprogramming with Ruby clearly takes less code than in Python. I don’t think there’s anything that Ruby does that Python cannot, or vice versa, with regards to metaprogramming. I’ve needed real metaprogramming in Python extremely rarely, and I don’t know if I’d use the metaprogramming in Ruby any more frequently. The examples of metaprogramming with Ruby that I’ve seen (The Poignant Guide’s chapter, or the way ActiveRecord works) would have been doable, in Python at least, by inheriting from a base class and using class attributes on the derived class as configuration variables. So, whatever win that Ruby gets from easier metaprogramming is minor.

Conclusion

Ruby has standard libraries that are so poor the community has provided drop-in replacements. The documentation about the current and future versions of the language is extremely lacking. The core implementation of the language is not competitive with other interpreted languages. And the language itself is full of idiosyncracies and inconsistencies that are neither useful nor lend themselves to cleaner, simpler code. The language is not without promise or potential, but in its current state there is no reason to choose it over a mature, robust language like Python.

I’d like to thank Jeremy Avnet, Steve Hazel, Greg Hazel, and Ross Cohen for their comments and corrections on drafts of this article. Nonetheless, all inflammatory opinions and any inaccuracies are my responsibility. Subscribe here to read any follow-ups to this article.

Read the follow-up: Ruby’s not ready: comments, corrections, and clarifications

  1. Even the Rails documentation at noobkit.org, the official documentation for the official Rails IRC channel, can’t seem to get Unicode support working (scroll down to “3. Go to localhost:3000/ and get ‘Welcome aboard: You’re riding the Rails!'”). []
  2. Perhaps Ruby could also use ...., which would return an even shorter range, not including the beginning or end points. []

PottyMouth ported to Ruby

I’ve ported PottyMouth 1.0.2, my library for transforming completely unstructured and untrusted text to valid, nice-looking, completely safe XHTML, from Python to Ruby 1.9. If you’re a Ruby user or fan, let me know what you think. This is part of a larger project to learn and evaluate Ruby. I’ll be posting my findings soon, so subscribe if you’re curious why I used Ruby 1.9, or if you’re interested in reading my thoughts on Ruby.

If programming were like building a house…

I’m always coming up with metaphors to explain to non-technical people what I do. The point of this one was to explain to people why I prefer Python:

  • Programming in C is like building a four-story mansion out of 1×2 Lego bricks.
  • Programming in Python is like building a house out of Lego Technic parts.
  • Programming in Perl is like building a house with duct tape, a flat of cinder-blocks, some left-over lumber, pipe cleaners, crazy glue, fishing line, and chicken wire. You also get a bunch of re-bar that you can bend into whatever shape you want, and truck full of spray-on concrete.
  • Programming in PHP is like building a house with just chicken wire, coat hangers, and aluminum foil. Luckily, if you use enough aluminum foil, it will shield your brain from the alien transmissions from outer space, so you don’t have to wear your aluminum foil hat while you’re at home.
  • Programming in Java is like buying a one-piece, hyper-modern, injection-molded plastic house unit, and then having your lawyer write a letter to the house manufacturer asking for permission to cut a one-meter by two-meter hole in the living room wall into which you’ll install the front door, since the house doesn’t come with one. Your lawyer promises in the letter not to sue the house manufacturer for problems with the door.
  • Programming in Ruby1 is like building a house out of two different brands of cheap knock-off Legos, made of flimsy, low-quality plastic, which don’t fit together quite snugly enough. You also get a handful of puke-green pipe-cleaners left over from Perl.
  • Programming in JavaScript is like building a house out of Jell-O that has to stand up on three different lots, one flat, one downhill, and one uphill. Just when you’re finished building the house, Bill Cosby rides up on a stallion, ready to start filming a Jell-O commercial.
  • Programming in XSLT is like hiring an architect who speaks only Icelandic, an engineer who speaks only Bantu, and a bunch of Nepalese sherpas as the construction team. The architect thinks you’re building a supermarket, and the engineer thinks you’re inventing a new kind of refrigerator, and the sherpas think you’re doing performance art. The engineer builds a catapult to fling 2x4s into the air while the sherpas fire high-powered nail-guns at them, and when it’s all done you’ve got an ordinary two bedroom suburban house that’s completely upside-down.

That’s all! Happy April Fools!

  1. I’ll be posting a larger article about my recent experiences learning Ruby in a few days. Subscribe here if you want to read it when it’s posted. []

Optical kerning demo

I’ve finally found the time to get the RoboFab libraries for Python working for me, and I’ve coded the core of the optical kerning algorithm that’s been rattling around in my head for a few years. My test input font consists of six characters meant to imitate A, V, H, O, t, X and a diamond shape. I was only expecting the algorithm to generate approximate kernings that would need to be tweaked by hand, but surprise, surprise; it’s almost perfect:

optical_kerning.png

The only problem really is the tXt kerning, and that’s more an artifact of the too-regular, sans-serif shape of the glyphs.

The algorithm takes 1.7 2.6 seconds (wall time) to generate kern pairs for the 49 36 combinations of these seven six glyphs (including the time to read the font off disk, and convert it to UFO, and write that back to disk). That works out to faster than 0.036 0.072 seconds per glyph pair. Implementing the algorithm in C and caching the most common digraphs/kerning pairs might make it fast enough to use in a text-editing or layout program. (Struck out items are from before I added the diamond glyph and re-wrote some parts of the algorithm to be cleverer.)

Next step is to clean it up into a real application and run it on a full set of glyphs from a serif font, and compare the result with the font’s hand-kerning.

A good, fast optical kerning algorithm would even let you kern together different faces and different sizes. Wow, this is exciting.

Proposal for labeled break and continue in Python

I’ve created and submitted a new PEP proposing support for labels in Python’s break and continue statements. Georg Brandl has graciously added it to the PEP list as PEP 3136. Yay!

For added weirdness, read the alternative specifications… I came up with a few quite bizarre ways to implement loop-specific break and continue.