Tag Archives: MVC

Ten ways to build an unmaintainable web application

Old-school hackers had a long tradition of ensuring job security by building applications so unmaintainable that only the original authors could work on them. But in these days of web applications, unmaintainability has fallen by the wayside. Instead, design fads like CRUD, REST, MVC, DRY, and KISS, have eliminated the average programmer’s job security.

Here are ten quick tips for achieving maximum unmaintainability in your web application. Following them will ensure that, in thirty years, a web programmer like you will be as valuable as a fifty-eight year old COBOL programmer contracting at $200/hr for a Fortune 500 company that still hasn’t migrated off of PL/1. You too will be able to live on a dairy farm in Pennsylvania, grow a beard down to your navel, and work in your underwear. And you’ll never have to learn anything new, work with anyone else, or start another new project.

  1. Mix it up. Put some JavaScript into external files, but be sure to intersperse JavaScript into your HTML, some of it in <script> tags. Cram multiple JavaScript statements into onclick and other event attributes — the longer, the better. Do the same with CSS; put some into external files, some in <style> tags, and also put some critical CSS into complex style attributes. And remember to put most of your <script> and <style> tags in the middle of the page content, instead of in the <head>, so that they will be difficult to find.
  2. Make everything dynamic. Generate JavaScript and CSS in your HTML templates. Think of it as another type of eval. Generate HTML server-side using templates and browser-side using JavaScript. What’s harder than working around a obscure IE layout bug with weird markup tweaks? Making sure both your server templates and your JavaScript HTML generation work around the same bug with the same HTML black magic.
  3. Abstraction, Shmabstraction. Pass lots of data from the server to the browser, store it in hidden form fields in the page, and then pass it back, unchanged, when submitting the form. That way, when the back-end data model changes, you get to rewrite part of the interface too. Allow data-model or server implementation details to creep into the interface implementation. Is the database sharded? Is the cache dirty? Does this row use a composite key? No need to have the server abstract these details, just pass that information to the JavaScript and let it sort everything out. That way, a sysadmin or a DBA can break the UI just as easily as a web designer can.
  4. Keep your data unstructured. Make sure all communication between the browser and the server is just a flat list of key/value parameters. Some of your parameters will be data to store, others will be modes or flags that affect the behavior of the service you’re hitting, and still others will be modifiers to display messages or affect the behavior of the UI. Keeping your data unstructured ensures these different types of parameters will collide. Often.
  5. Commit to a platform. Don’t waste your time checking to see if your pages work in all browsers (at least not until you’re totally done). Better yet, develop only in a single browser and don’t even bother to find out whether the features you’re relying on even exist in other browsers. Nothing is more fragile than an application that’s tightly tied to a single platform.
  6. Trust the browser. Rely soley on JavaScript input checking for some data — don’t check input on the server-side. Store sensitive data in hidden form fields. Put authorization checks in the JavaScript rather than on the server. Parameters like authorized=1 just scream out for URL hacking, and storing them in hidden form fields is only slightly harder to hack.
  7. Trust the server. Rely soley on the server to check, store, and generate only valid data in some places. That way, a DBA can change a single column constraint or data-type, and parts of the UI start to fail.
  8. Don’t use DOCTYPEs. That way you’ll never be sure what rendering mode different browsers are going to use to render your content.
  9. Ignore the cascade. Don’t bother to understand what the C in CSS stands for.  Just keep overriding styles until a page element looks the way you want. That way, your styles will be fragile and will break unexpectedly when an intern changes something a reasonable person would expect to be unrelated.
  10. Don’t use classes or ids. Instead, always write JavaScript and CSS that finds nodes based on tag name, name, alt or title attributes, or by their position in the DOM. That way when anything in the page changes, the hierarchy, the attributes, or when the site is translated into another language, things break. If you do end up using class or id, be sure to make a separate class for every node in your document and assign the same id to several different nodes.

If, however, you want to write flexible code that can react to and evolve with the ever-changing needs of its users, even after you have left the project in the hands of a clever but inexperienced hacker, you should probably avoid these techniques, and read up on some of those lame new design fads instead.

Special thanks to all the programmers whose code has illuminated these techniques over the years. My job may not be as secure as yours, but at least my code, and my conscience, are clear.

The next big thing, part 3: Taking the relational out of relational databases

Part of an ongoing series.

The relational database is an extremely powerful tool. But sometimes data isn’t very relational, and sometimes transactional, relational, integrity is not as important as it is for, say, a bank. This is one reason why so many sites can get away with mySQL backed by myISAM tables — they’re fine if you’re read-heavy and data integrity is not mission-critical.

Some new projects have sprung up which provide key-value stores or simpler kinds of databases without all the overhead and inflexibility of a relational database.

On the other hand, sometimes data is way more interrelated than a traditional relational database is prepared to handle. Sometimes different kinds of items (i.e. rows) in a database can be related to many other kinds of items in that database, and sometimes end users can create not just new items or new relationships, but new kinds of relationships between items. This type of database is called a graph database, and there are also projects pushing the boundaries of relational in this completely opposite direction.

Pretty much everywhere I interviewed back in February 2008 was either building their own graph database, working on an existing one, or repurposing a relational database (or, in one case, a search backend), to kinda, sorta behave like one. The w3c, not one to be left behind when there’s a specification to be written, is even working on a SQL-inspired query language intended to search them1.

Most applications have some combination of totally un-relational data that can go in a key-value store, some strictly relational data that belongs in a SQL database, and some flexible, highly relational data that belongs in a graph database.

What will happen when these alternative databases start giving traditional relational databases a run for their money? Well, sharding, caching, and normalization all start to sound a lot more complex when the data is in a few different kinds of databases — but then again, maybe optimization won’t be as necessary if a single SQL database isn’t doing all the heavy lifting. Object-relational mappers (and the web frameworks that use them) might need to talk to, and abstract away from, different kinds of databases2.

And the different types of data won’t always be easily separated along table boundaries. Maybe these different types of databases will talk to each other, or maybe they will mature into über-databases that understand lots of different types of data relationships.

But the monolithic, strictly relational, master SQL database is eventually going to go the way of Cobol3.

  1. Of course, if it’s anything like other technologies designed by the w3c, it’s a steaming pile. []
  2. Some can already handle talking to multiple SQL databases, and of course there’s two-phase commit. []
  3. Or Kobol. []

The next big thing, part 2: Taking the web out of web applications

Part of an ongoing series.

A web application is just a stateless1 application that responds to various requests by performing actions and providing resources. There’s no fundamental reason an application must only communicate over HTTP. Web applications are going to start adding alternative methods of interaction, and I think the first common one will be email.

Perhaps an example will best illustrate this:

Like many web forums, posts to Mosuki‘s discussion forums get mailed out in email. But, unlike any other web forums I know of, they also behave like mailing lists. All the emails have a reply-to header with an email address that identifies the message, the recipient, and the action to be taken if that email address is used. In this case, the contents of a reply email are posted to the forum exactly as if a reply had been posted via the website.

In other words, the action “post a message” can be accessed via a web page and a browser or via a reply-to header and your mail client.

There are other examples of this separation between input/output channels and the application logic. The most obvious is Twitter, which of course can be interacted with via HTTP or SMS2. And the Son of Sam project intends to let you “use modern concepts like handlers, requests, responses, state machines” to interact with email.

Confirm a Facebook friend request, RSVP to an Evite, revert a Wikipedia edit, or reassign a bug report, just by replying to an email or sending an SMS.

There are a number of technical issues inherent a system like this.  An application’s framework has to handle multiple input channels, and massage email bodies, HTTP requests, and other input into a least common denominator “request.” Authenticating a user via email, an intrinsically forgeable medium, and protecting against spam, are non-trivial challenges. And a suite of templates suddenly gets a lot more complex when it has to provide views for multiple types of interfaces3 .

This blurring of the line between email, HTTP, SMS, and other communications is not new, strictly speaking. But I think it will become commonplace and even expected. Rather than writing a modern (MVC, stateless, REST-ful, &c.) web application, people will be writing modern (MVC, stateless, REST-ful, blah, blah, blah) applications that have web interfaces, email interfaces, and whatever other interfaces they need.

Stay tuned for the next installment of The next big thing: Taking the relational out of relational databases.

  1. More or less stateless, that is, authentication tokens like cookies notwithstanding. []
  2. As well as more standalone apps than you can shake a stick at. []
  3. Generating text and HTML responses for email that look good and work well in the top 75% of desktop and web email clients is a lot harder than testing a site’s HTML in Firefox, IE, Safari and Opera. []

Another browser-side Model-View-Controller analogy

Coding Horror presents another way to think about the browser side of web apps as underlyingly MVC in Understanding Model-View-Controller. It’s interesting, but I still prefer my analogy for primarily AJAX web apps; when the data comes in primarily through XMLHTTPRequest, it doesn’t make much sense to think of anything but the JavaScript that handles XMLHTTPRequest responses as the model.

Ruby’s not ready: comments, corrections, and clarifications

Some good discussion on this one. It’s nice to see Ruby people saying things like this (5th message from the top, from Song Ma):

Interesting. But what I am thinking about is not the attitude of the author, but the points he was trying to make. The deep review and discussion will benefit the language insights.

Or this one (from Trans, on the same forum):

Why is everyone getting so worked up? It’s a critique. Biased it may be, but that in itself does not make it worthless. In fact, it can be very constructive b/c it uncovers “attack points” with the language. With each point we can ask ourselves objectively is this a misconception or a fair point? In either case we have an opportunity, to address misconceptions in our Ruby evangelizing blogs and to work to improve Ruby where a point has merit.

Bias can work both ways. But I think the Ruby community can rise above it, and Ruby will be all the better for it.

And from Peter Cooper at Ruby Inside:

As it is, I think he’s missing the point a lot of the time (he tends to think Python’s better because he likes its conventions more than Ruby’s – not a compelling argument), but it’s an interesting read none the less. Anything that keeps our minds open to the fact that Ruby != perfection is worth a look.

And a comment on the same post:

Let’s take his best points and incorporate them into future versions of Ruby.

Sounds like a plan.

I saw a few counterarguments like this:

Everything he’s saying is well known.

Just because a problem is well known inside a community doesn’t make it any less of a problem.

Everybody who mentioned documentation, even those who disagreed strongly with the rest of my post, agreed that Ruby’s documentation is seriously lacking. In fact, a lot of the mistakes in my original post are due to me not being able to easily find an explanation of something on the various Ruby doc sites. Which leads me to…

Continue reading

Ruby’s not ready

Introduction

A few weeks ago, I learned Ruby and Ruby on Rails to compare them head-to-head against Python and Pylons, in preparation for a new project. When I began, I knew nothing about Ruby or Ruby on Rails. I have tried to be as objective as possible: before beginning this project, I wrote in email on March 5th:

I promise we’ll be as objective as humanly possible; if Ruby and Ruby on Rails truly is better, we’ll happily use RoR and never look back. I want to know that I’m using the absolute best tool for the job.

Since then, I have reimplemented one complex nine-hundred line Python library, PottyMouth, in Ruby. Another team member has also reimplemented parts the Pylons web application Spydentify in Ruby on Rails.

The best tool for the job is Python & Pylons. While Rails and Pylons are similar, shortcomings in Ruby compared to Python make Python & Pylons the clear choice. I make three basic arguments against using Ruby:

  • The language and its implementation are incomplete and immature. Immature implementations breed performance issues. A project loses time when it must implement missing or incomplete functionality.
  • The language is inconsistent and needlessly complex. Inconsistency and complexity confuses people and confusion breeds bugs.
  • The documentation is incomplete. Incomplete documentation breeds bugs as you might misuse a feature. And a project slows down while you read the language or library source code, or ask the community for help with undocumented features.

I believe Ruby would fare poorly against other languages, not just Python, on these angles as well.

Why have Ruby and Ruby on Rails gained so much traction, despite these issues? Aside from the Rails hype, it’s because they are not insurmountable issues. It is possible to build a large application in Ruby; many people have. But any programmer building a large application in Ruby will have to deal with the issues listed here at some point. These are all issues that do not appear right away. A project doesn’t face them until a website reaches maturity, develops lots of features, fields traffic from lots of users, or until a project hires programmers who aren’t Ruby experts or experienced enough to anticipate these issues.

My point is simply that Python (and other languages), allow you to handle most of these issues more elegantly, or avoid them completely.

Subscribe here if you’d like to be notified of any follow-up posts (for an article this long, I’m sure there will be a few), or if you’d like to read my critiques, positive and negative, of other things technological.

Contents

  1. Unicode and encodings
  2. Regular Expressions
  3. Documentation
  4. Migration to Ruby 1.9/2.0
  5. Performance
  6. Scoping
    1. One nice thing about Ruby’s scoping
  7. There’s more than one way to do it
    1. String conversion
    2. print, p and puts
    3. Ranges and slices
    4. require and load
    5. Raising exceptions, throwing strings
    6. do and then are extraneous
    7. length and size, update and merge
  8. Object model
  9. Faking keyword arguments
  10. Libraries
    1. SAP support
    2. DateTime support
  11. Debugging
  12. Rails & Pylons
  13. Cool things about Ruby
  14. Conclusion
  15. Further reading

Unicode and encodings

If you’re already familiar with Ruby’s problems with Unicode, feel free to skip this section. Ruby did not have any support for Unicode character strings when it was originally released in 1996. This is only slightly silly for a language that was invented after Unicode 1.0 was released in 1992. It is inexcusably shortsighted that Ruby has not added Unicode objects over the last twelve years.

A third-party Ruby library for conversion ties into the Unix iconv program, allowing conversion between two different encodings. However, converted strings are still sequences of bytes. This means that using most of the string methods (slice, reverse, size, index, downcase, upcase, strip) and indexing into the string with [] notation do not work in non-ASCII encoded strings. You can get the desired results out of these methods by first accessing the .chars attribute of non-ASCII strings. This is less desirable because the programmer must remember to use .chars whenever he or she is working with non-ASCII strings.

A better solution would be to support first-class Unicode objects, as strings of Unicode characters, natively in the language.

There is a third-party Unicode support library that replaces Ruby’s String class and adds Unicode support, but it is acknowledged to be hackish, potentially dangerous, and makes Ruby somewhat slower.

Unicode support may or may not be forthcoming in Ruby 2.0. There are certainly members of the community advocating it.

This means, among other things, that there is no built-in support in Rails’ HTML generation1 , for converting Unicode characters to HTML entities. This page details how to hack around this problem; but this is something that should be automatic and built-in, not hacked around.

Python’s built-in Unicode and encodings support, which is a first-class, native Unicode object and a full suite of built-in encodings, was introduced in Python 1.6 in 2000. It has evolved into an extremely reliable, secure and versatile Unicode implementation. It is also extremely simple to use.

Python supports all of the encodings that Ruby supports via iconv, and a number that it doesn’t, including Quoted-Printable, the encoding used for the vast majority of email messages, and MBCS, the encoding used by Windows FAT32 and NTFS file-systems.

Because Python’s built-in Unicode support is so robust, the vast majority of Python libraries all convert to Unicode when accepting input, and convert to the proper encoding when producing output. Multi-language support and correct encoding handling is usually a non-issue when building a Python application. For example, non-ASCII input to, and output from, a Pylons web application Just Works™.

Regular Expressions

For a language that borrows so heavily from Perl, the regular expression support in Ruby is pretty disappointing. Regular expressions might not seem like a very important part of a language, but it’s an interesting litmus test because Python, Perl, and JavaScript all support essentially the same regular expression syntax. Ruby’s regular expressions, however, were so broken that I switched to Ruby 1.9 to finish porting PottyMouth.

Ruby’s Regexp::MULTILINE flag doesn’t behave the way multiline does in other languages. In other languages, the multiline flag is off by default, and when enabled, it considers . to include newlines and ^ and $ to match right after, and right before, every newline:

In Perl:

if ( "foonbarnbaz" =~ /^bar/m ) { print "yesn"; } else { print "non";}
yes
if ( "foonbarnbaz" =~ /^bar/ ) { print "yesn"; } else { print "non"; }
no

In Python:

>>> import re
# This matches
>>> re.search('^baz', "foonbarnbaz", re.MULTILINE)
<_sre.SRE_Match object at 0xb7c4bf38>

# This does not match
>>> re.search('^baz', "foonbarnbaz")

However, in Ruby, the Regexp::MULTILINE flag appears to only affect the interpretation of ., not ^ and $, making it more like Python’s re.DOTALL or Perl’s /s switch.

in Ruby:

irb(main):001:0> /^baz/.match("foonbarnbaz")
=> #<MatchData:0xb7cd42b0>
irb(main):002:0> /^baz/m.match("foonbarnbaz")
=> #<MatchData:0xb7cdb740>

irb(main):003:0> Regexp.new('^baz').match("foonbarnbaz")
=> #<MatchData:0xb7cc7df8>
irb(main):004:0> Regexp.new('^baz', Regexp::MULTILINE).match("foonbarnbaz")
=> #<MatchData:0xb7cb59dc>

There is no documentation whatsoever of the actual semantics of Regexp::MULTILINE, so it’s not clear whether this is an accident, a bug, or an intentional departure from the standard. Either way, it makes the language more difficult to learn and less predictable to use.

There’s also no documentation whatsoever of the actual semantics of Regexp::EXTENDED. The eregex.rb file in the Ruby source just adds support for & and | logical operators, and the only documentation is the message “This is just a proof of concept toy.” As best I can tell, regular expressions in Ruby always behave like extended regular expressions, supporting ?, +, | and N, regardless of whether you use the extended flag or not. What does the extended flag actually do? I don’t know.

Ruby’s Regular expressions also match only ASCII and a small set of encodings, including UTF-8 and the Japanese encodings EUC and SJIS. Want to write a regular expression that matches UTF-16, Latin-1, or raw Unicode? You’ll have to use the third-party Oniguruma package or a different programming language. You can’t use pure Ruby.

Lastly, positive and negative look-behind aren’t supported in Ruby 1.8. I only noticed this because the code I was porting used negative look-behind expressions. Ruby 1.9 adds look-behind. The options for Ruby 1.8 users are to install 1.9 or the third-party Oniguruma package, which also supports many more encodings (but still not raw Unicode).

Both Python and Perl support positive and negative look-behind and Unicode regular expressions natively.

In general, it’s a bad sign when a third-party reimplements a large chunk of functionality in an existing piece of software. It means that the existing functionality was just plain not good enough. And, for open source projects, it means that the existing project was unable, or unwilling, to solve the problem, or let others contribute patches to solve the problem, within the project. The fact that this happened for both Ruby’s encoding and regular expression support is disturbing.

Documentation

The Standard Library Documentation for Ruby is woefully incomplete. For example:

  • There is no documentation whatsoever for:
    • the digest library, which contains the SHA1 and MD5 check-sum tools. These tools are critical for generating secure cookies and storing user passwords securely. Without documentation, you have to go read the Ruby source code to know that your application is secure.
    • Racc, a LALR(1) parser generator for Ruby
  • The documentation for gdbm only includes a list of constants it defines. No methods, descriptions, or anything else.
  • The documentation for the syslog module is useless. It lists one method, close. A useful syslog library would have to have at least open and write functionality.
  • The link to the tcltklib module documentation returns an error page.
  • The Profiler documentation is extremely limited. It looks like it would be possible to use the profiler, but there’s no information about how it works, which is critical when you are profiling an application.
  • As noted above, the regular expression documentation doesn’t cover MULTILINE or EXTENDED.

In general, the majority of modules listed have no description page. None of the pages specifically state which version of Ruby they were written for.

Ruby’s development and documentation writing appear to be two disconnected endeavors, and the documentation is acknowledged to be incomplete. In fact, there is no single rally point for Ruby material. A visit to the the official Ruby documentation page lists a variety of documentation, tutorials, examples, etc. spread across many different websites with varying levels of completeness and relevancy to the current version of Ruby. There is no single tutorial or language overview which is complete for the current version of Ruby 1.8.x (over four years old).

This doesn’t inspire confidence. Are there any libraries that aren’t listed at all? And how many of the existing libraries have documentation that is incomplete, out-of-date, or incorrect?

By contrast, Python’s standard library documentation is complete, versioned and dated.

Migration to Ruby 1.9/2.0

It’s not clear what’s happening with regard to Ruby 1.9 and/or 2.0. Ruby 1.9 has been under development since (at least) 2006. (It may have been under development longer than that; the lack of any official documentation about it makes it hard to know for sure. This podcast claims it’s been around longer than Perl 6, which would make Ruby 2.0 almost as old as Ruby itself.) An experimental/development version, 1.9.0, was released in December 2007. I tested against the version of Ruby 1.9 in Ubuntu 7.10: 1.9.0+20070830-2ubuntu1.

Quite a few things that are allegedly new in Ruby 1.9 actually exist in Ruby 1.8, and it’s not clear whether they’ve been back-ported or whether their behavior has only subtly changed.

For example return value unpacking, % string formatting, and newlines inside the ternary operator are supposedly new in 1.9, but work exactly the same in 1.8. Other things that are supposed to be introduced in Ruby 1.9, like multiple splats, don’t (yet) work at all in 1.9.

Other improvements in Ruby 1.9 include literal hash syntax, block-local variables, and, as already noted, better encoding and regular expression support.

Some documents indicate that Ruby 2.0 is going to be different in ways that will break existing Ruby 1.x programs severely. They also contain disturbing statements like this one about the new garbage collector: “It will be (mostly) thread safe.” Being (mostly) thread safe is like being mostly pregnant. You either are, or you aren’t.

There are two explanations for this lack of clear plan for Ruby 2.0. Either Ruby 2.0 is so far off that no such document would be useful yet, or nobody in the Ruby community has thought about these issues yet. Both of these would be bad signs. Either way, it’s a total mystery how difficult it will be to move to Ruby 2.0, or when that move might have to happen.

Python, on the other hand, has been in the 2.x series for a long time. Planning for Python 2.0 began while Python 1.5 was the current version. In September 2000, as Python 1.6 was released, there was a complete outline available of what to expect from Python 2.0. Python 2.0 was released in October, 2000. Programs written eight years ago for Python 2.0 will still run, unmodified under Python 2.5. Many Python 1.x programs will also run under 2.5.

Python 2.6 and Python 3.0 are slated for release this summer. The Python 3.0 process has been going on for about a year. There is a clear outline of exactly what’s changing between 2 and 3, and guidelines for how to write Python code that will run equally well under 2.6 and 3.0. The Python developers are also providing a conversion program that will automatically translate between 2.6 and 3.0 code, and warn programmers about code it was not able to translate.

Python proves that a programming language can evolve safely, easily, and largely free of hassles. Future versions should not be a potential wild-card (or worse, a complete clusterfuck, as with PHP).

For a piece of software that’s going to be the core of your business for as long as you are in business — hopefully many, many years — why choose anything other than a language with a migration process like Python’s?

Performance

It’s difficult to precisely evaluate the difference in execution time between different languages. However, The Computer Language Benchmark Game gives a pretty strong indication that Ruby is slow. On its tests, Python is 3×-4× as fast as Ruby. Ruby is slower than TCL, a language that is twenty years old. Ruby is about the same speed as JavaScript (in Mozilla’s SpiderMonkey interpreter). The only thing slower than Ruby is Prolog.

The notorious Rails is a Ghetto article outlines performance problems in Ruby and Rails that, disturbingly, went unaddressed for long periods of time. In the worst one, the author reported serious performance issues to the Rails community, which largely ignored the problem or denied its existence. Meanwhile, the problem had been identified and patched by someone else, but the Ruby core developers ignored the patch for a year.

In another incident in the same article, the original Rails author admits that the original Rails code required about four hundred restarts a day, or six to seven restarts per thread per day. Four hundred restarts a day means four-hundred chances for a database transaction to fail, four hundred chances for a verification email to be sent by the system without the corresponding data being stored in the database, four hundred chances for the user’s browser to not receive all the data it needs to correctly render a page or display data.

Even for a project for which performance is not the primary concern, these trends should be cause for concern. Serious performance issues mean buying more RAM, and upgrading servers sooner.

Scoping

Ruby’s scoping rules are complex:

  1. Files, modules, classes, defs and blocks create new scopes.
  2. Local variables have no sigil and begin with a lowercase letter. They are available only in the scope they are defined in.
  3. “Constants” have no sigil and begin with an uppercase letter. They are available in the scope they are defined in and in all enclosed scopes.
    1. “Constants” are not constant; they can be reassigned whenever you like, just like everything else.
  4. Globals begin with the $ sigil and are global.
  5. Instance attributes begin with the @ sigil and are, by default, protected, or available only inside the class.
    1. Instance attributes can be made available outside the class with attr_accessor or attr_reader.
  6. Class attributes begin with the @@ sigil and are available only inside the class.
    1. Unlike instance attributes, class attributes cannot be accessed outside the class with attr_accessor or friends.
  7. Methods are, by default, public.
  8. Methods can be made private or protected with the private or protected keywords.
    1. protected doesn’t mean what you think it means. Both private and protected methods are available within the class and within all containing subclasses.

The terminology used to refer to non-constant “constants” is extremely unfortunate.

Why isn’t the full range of public/protected/private scopes available to attributes as well as methods? Why is a totally different convention used to scope attributes? Why can’t class attributes be accessed outside the class like instance attributes?

What is the point of the subtle, weird difference between protected and private? What problem does it solve? Why don’t protected and private work the way they do in Java and PHP?

Clear, consistent, simple scoping makes it easy to keep track of what variables are available where. Complex scoping rules mean there’s more to remember, there are more mistakes to make and more ways to get confused. Mistakes and confusion cause bugs.

Python uses just a naming convention to convey whether a variable should be thought of as private or protected. Both Python and Ruby can be monkeypatched to modify private and protected attributes or methods, so it’s best to think of private and protected as purely advisory in either language. Experienced Pythonistas learn that someobj.__private__ is a red flag; the fact that you must always monkeypatch to do this in Ruby might provide an additional disincentive to doing it, but it also makes it easier to do it on accident.

One nice thing about Ruby’s scoping

There’s one place where Ruby’s scope behavior is better than Python’s. Default argument values in function definitions are (re)evaluated each time a method is called in Ruby. In Python, default argument values get evaluated in the containing scope when the function is defined. This can get you into trouble in Python, if a default value is a mutable type. If you’re modifying the value, it’ll persist across subsequent calls to foo:

def foo(arg=[]):

In Python, you end up having to do this:

def foo(arg=None):
if foo is None:
foo=[]
# some code

This is definitely less clear than in Ruby, where you can simply say what you mean:

def foo(arg=[])
# some code
end

There’s more than one way to do it

We can thank Larry Wall and Perl for There’s more than one way to do it. TMTOWTDI is bad, because to really know a language, you must know each of several ways to do similar, but different things, and each synonym. If there is only one way to do it, you only have to remember that one way, instead of many. Programmers spend more time reading code than writing it, and often, they’re reading other people’s code, so they can’t get away with remembering only their favorite way to do it. The more you have to remember, the more likely you are to forget, make a mistake, or have to stop to check the documentation. Mistakes breed bugs, and checking the documentation takes time. While not nearly as bad as Perl on this front, Ruby commits some serious TMTOWTDI.

String conversion

Some Ruby objects have an extra stringification method, .to_str, as well as the standard .to_s. .to_s is an explicit cast, used whenever you need a string representation of an object. .to_str is an implicit cast, which gets called when you are using a string-like object in a context that requires a string. (This illustrates a philosophical difference between Python on the one hand and Ruby and Perl on the other; Python never does context-sensitive implicit conversion.)

The naming of these methods is atrocious — they are radically semantically different, yet the name of one is an abbreviation of the name of another. What happens if you write code that critically relies on this distinction, go work in another language for six months, and then get called in to fix a critical production bug in that code? Would you remember which is which, and what the difference was, exactly? I wouldn’t. And the presence of both has confused people other than me. .to_str should be named something like .stringcontext.

What is the use case for to_s‘s concatenation of arrays and hashes? It just runs keys, values, and items together in a string, making it impossible to tell whether it was a number, a string, a hash or an array that you just stringified:


irb(main):001:0> h = {1=>2}
=> {1=>2}
irb(main):002:0> a = [1,2]
=> [1, 2]
irb(main):003:0> h.to_s
=> "12"
irb(main):004:0> a.to_s
=> "12"
irb(main):005:0> a.to_s == h.to_s
=> true

When is this useful? It’s not human-readable, and it’s not computer-readable. It’s just mangled garbage. It’s even worse when you call to_s on more complex data structures:


irb(main):001:0> c = [1, :two, {:foo=>['bar', 'baz'], :key=>'val'}, [7,8], 9]
=> [1, :two, {:key=>"val", :foo=>["bar", "baz"]}, [7, 8], 9]
irb(main):002:0> c.to_s
=> "1twokeyvalfoobarbaz789"

As a side note, Python has two stringification methods as well, str() and repr(). str() provides a string representation, and repr() provides a string that can be eval()-ed or pasted into a Python interpreter. Ruby appears to have no equivalent of repr(), aside from p, which leads to the next topic….

print, p and puts

What is the difference between print, p, and puts? This isn’t documented. p is a sort of poor-man’s repr(), printing each argument in a form that could be pasted into Ruby source code on a separate line. Strangely, there’s no way to capture the output of p and store that string for later. print prints each of its arguments without any space between them. puts prints each of its arguments, or each item in each collection argument, on a separate line. Why does Ruby need all three? Are you going to be able to remember which one behaves each way, and use the right one at the right time?

Not only does Python get away with one, but the behavior of print in Python is exactly what I’ve wanted out of print, or printf, in every programming languge I’ve ever used — print the str()-ification of every argument, separated by spaces, with a newline at the end. If you want all the arguments concatenated, or on separate lines, you can join on empty string, or "n". If you don’t want a trailing newline, use a trailing comma.

The behavior of puts is even weirder; it seems to stop descending into collections at some point. Note that the hash item is just stringified, but the items in the array inside the array are printed on separate lines:


irb(main):001:0> c = [1, :two, {:foo=>['bar', 'baz'], :key=>'val'}, [7,8], 9]
=> [1, :two, {:key=>"val", :foo=>["bar", "baz"]}, [7, 8], 9]
irb(main):002:0> puts c
1
two
keyvalfoobarbaz
7
8
9
=> nil
Ranges and slices

Ruby has two range operators, .. and .... Why? The only difference is that the shorter one, with two periods, returns a longer range, and the longer one returns a shorter range, not including the endpoint. Can this possibly get any more confusing?2 The language should have one, not two. It is not hard to add or subtract one when you want a range with, or without, its endpoints.

Python’s range() and xrange() built-in, and its slice syntax, are less confusing, as they never include the endpoint. They are also more powerful, because they allow a third “step” argument. Want every other, or every third, element in a list, or a range that steps by 2 or 3? Try l[::2] or l[::3], range(start, stop, 2) or range(start, stop, 3). Want the list in reverse? [::-1] Want a range in reverse? range(stop, start, -1). Python’s syntax is simpler, easier to remember, and more powerful to boot.

Ruby’s range operators are also used to see if a value is in a particular range, like this:

irb(main):001:0> (0..2**1000) === 2**999
=> true

This has a disturbingly clever ring to it. The range operator doesn’t actually walk through every element in the 2**1000 element range and compare it to 2**999 — if it did, this code wouldn’t execute instantaneously. It’s doing something like this underneath: 2**999 >= 0 and 2**999 < 2**1000. The only reason to use a range operator like this, when it’s about as much typing to just say what you mean directly, is when you have a range that you’re passing around as a variable.

In Python, the corresponding idiom is 0 <= 2**999 < 2**1000, but the ternary comparison syntax doesn’t work in Ruby, so you have to write 0 <= 2*999 and 2*999 < 2**1000 in Ruby. Python’s xrange() can also be passed around like a variable, but you test for membership with value in myrange instead of ===.

Now imagine you’re someone who hasn’t seen Ruby before, or who has been working in some other language for months, who is now tasked with fixing a critical bug which relies on this strange, non-obvious idiom. Are you going to know, or remember, that === combined with .. has special semantics? Compare that to how difficult of a time you will have understanding lower < value < upper, or value in myrange, in Python code. Simplicity and straightforward syntax has a significant long-term benefit.

require and load

Ruby has two ways to handle code in other files require and load. The difference is that require loads the code only once per application, and load loads it each time the interpreter sees load. Yet again, there’s more to remember to be fluent in Ruby. This distinction is of dubious value; if you have code that you want to run more than once, put it in a method and call the method. Don’t make the interpreter re-load, re-parse, and re-run the file.

And there’s more. The Ruby interpreter requires/loads the file corresponding to the path specified by a string. Unlike Python, Ruby has no concept of set of paths to search for modules by name, so you see recipes like this to establish a file’s location and find modules installed on a system.

And because Ruby loads strings as paths instead of modules by name, you can trick the interpreter into accidentally requiring a file twice. Oops! Python provides __import__ if you need to reload a module, but by default, it only loads modules once per application.

And finally, since require and load just pull the contents of another file into your local namespace, there’s no simple way to pull in just a single class or variable from a module. And there’s no way to ensure that classes in the file you’re importing don’t clobber classes in the file you’re importing it into. Want to attack lots of Ruby applications? Just write a helpful library with obfuscated code that overrides a common class, in something like HTTP or cookie authentication code, and adds a back door.

Raising exceptions, throwing symbols

Exception handling in Ruby is handled with raise/begin/rescue. Python uses raise/try/except, and Java & JavaScript use throw/try/catch to perform essentially the same exception handling. But Ruby also has throw/catch, which is unrelated to exception handling. It is normally used as a way to achieve labeled break.

Now, labeled break is a feature that I’d very much like to see in Python, but this feature in Ruby is essentially goto — and it’s even more powerful than goto in C, since it is not confined to single functions. Rather than debating the merits of goto, I’ll just ask this: does Ruby have to use terms that are commonly associated with exception handling, for a feature that is totally unrelated to exception handling?

do and then are extraneous

Ruby’s while and if statements can optionally have do and then keywords following them:

while condition do
# some code
end
if condition then
# some code
end

This is just one more extra variation that Ruby programmers have to remember to be able to read other people’s code.

length and size, update and merge

What is the difference between the length and size methods on String, Array, and Hash? There is none. Hashes have update and merge methods. What’s the difference? None.

These are particularly atrocious synonyms, because the English words they are based on aren’t synonymous. What if you have a class representing a geometric object, and you want length and size to return different measurements? What if you have a class representing a wiki page or source code repository, and you want update and merge to perform radically different operations? When someone else is reading your code, and they’ve been trained that these two methods are synonymous in Ruby, and they might forget that the methods aren’t synonymous in this particular code.

Object model

Ruby doesn’t require self to be explicitly passed in to methods. Python has explicit self, and for good reason.

Rather than using self to get at class and instance attributes, Ruby uses @ and @@. You can get at self, to pass it to a method in another object, by calling self. And you can get at the superclass’s method of the same name by calling super. Arguably, if you need to get at a different method on the superclass, rather than that different method on self, then your object’s inheritance is broken. This is different from Python, but still fine.

You can delegate to another method on that class by simply calling that method. And here’s the problem with Ruby’s object model: because you don’t need to use @ to access methods, it’s too easy to accidentally shadow a method with a local variable.

There’s at least one case that requires self as an explicit reciever: when calling an attribute writer. Otherwise you’re just shadowing the attribute writer method locally. It’s not clear that there might not be other rare cases that require self as an explicit reciever too. This seems dangerous; in Python, self is always required. In Ruby, you almost always don’t need self, except in the rare case where you do. This feels like an accident, or an overly clever solution. Clever solutions make me suspicious, and inconsistency breeds bugs. Simple solutions, like Python’s strict reliance on explicit self, make me confident I’m writing reliable code.

Faking keyword arguments

Ruby doesn’t support keyword arguments. The commmon idiom to “fake” keyword arguments lacks the expressiveness and versatility of Python.

In Python, you can have a function definition like this:

def HTMLTag(tagname, parent=None, *children, **attributes):

And you can call this function in many different ways:

HTMLTag("br")
HTMLTag("div", parent=bodytag)
HTMLTag("div", bodytag, p1, p2, p3, width="100%")
HTMLTag("a", p1, href="http://google.com", *["google"])
HTMLTag("a", p1, "google", href="http://google.com",)
HTMLTag("hr", width=77, parent=div, height=4, color="#000")
HTMLTag("hr", **{'parent':1, 'width':77, 'height':4, 'class':'ruler'})

Keyword arguments with default values may not seem like a very critical feature to be missing. But it’s one of the most powerful idioms in Python, because there are a lot of cases where arguments act like configuration, modifying a function’s behavior. If you can leave these modifiers off in the common case, code is faster to write and easier to read; you don’t have to remember the common modifier values; and you’re less likely to use the wrong modifier.

Ruby does support the * expansion and collection of Arrays, similar to Python. And it does support default values for optional arguments:

def tallandskinny(height=100, width=1)
print height, " tall ", width, " wide"
end

tallandskinny()
# prints "100 tall 1 wide"
tallandskinny(1)
# prints "1 tall 1 wide"
tallandskinny(1, 100)
# prints "1 tall 100 wide"

But the optional arguments can’t be passed in as key-words in a different order. Ruby collects any key-value pairs in an argument list into a Hash, but that hash takes the place of a single argument position; it has nothing to do with the parameter names in the method definition. Here, the hash gets assigned to height and then stringified:

tallandskinny(:width=>100, :height=>1)
# prints "width100height1 tall 1 wide"

To duplicate Python’s keyword argument behavior, you have to write something significantly more complicated:

def tallandskinny(kwargs={})
defaults = {:width=>1, :height=>100}

kwargs = defaults.update(kwargs)
print kwargs[:height], " tall ", kwargs[:width], " wide"
end

People on the #ruby-lang IRC channel were quick to point me to snippets of code like this and say “Ruby can fake Python-style keyword arguments easily.” And they’re right, you can fake it. But users (programmers, in this case) shouldn’t have to resort to tricks to get a piece of software (a programming language, in this case) to work the way they want it to work. If users are doing this, it means the software has failed to provide the features its users need.

It looks like keyword arguments are at least under consideration for a future version of Ruby.

Libraries

SAP support

SAP support is critical to the application that I’ll be working on. Ruby’s SAP support is alpha, version 0.06, and hasn’t been updated in over a year. Python’s SAP support is 1.0, and has been around for four years and there is documentation written by a SAP developer.

DateTime support

Ruby supports Date and DateTime objects natively, but there’s no duration or timedelta support built-in. There’s only a third party Duration library written by the Rails people, no doubt to support the SQL duration type. It’s unacceptable that a duration/timedelta isn’t built in. Why not? Because if it were built-in, subtracting two dates could return a timedelta, instead of a Rational, as it does in Ruby:

require 'date'
irb(main):002:0> puts Date.new(2008, 03, 29) - DateTime.new(2008, 3, 28, 22, 8 )
7/90

It’s not helpful to know that the time delta between 2008-3-28 22:08 and 2008-3-29 is 7/90ths (of a day). What would be helpful is to know that it’s 1:52:00, like in Python:

>>> from datetime import datetime
>>> dur = datetime(2008, 3, 29) - datetime(2008, 3, 28, 22, 8 )
>>> print dur, type(dur)
1:52:00 <type 'datetime.timedelta'>

Debugging

Ruby tracebacks don’t print the line of code on which the error occurred. Compare these two tracebacks, each in programs that divide by zero three function calls deep:

hack.rb:10:in `/': divided by 0 (ZeroDivisionError)
from hack.rb:10:in `baz'
from hack.rb:6:in `bar'
from hack.rb:2:in `foo'
from hack.rb:13
Traceback (most recent call last):
File "hack.py", line 10, in <module>
foo(0)
File "hack.py", line 2, in foo
bar(arg)
File "hack.py", line 5, in bar
baz(arg)
File "hack.py", line 8, in baz
8/arg
ZeroDivisionError: integer division or modulo by zero

Often you can see exactly what’s going wrong just from a Python traceback, because you can see the line of code that was a problem. Debugging Ruby is slower and more difficult because it doesn’t provide this information.

By the way, Perl’s even worse than Ruby at providing useful information when there’s an error:

Illegal division by zero at hack.pl line 10.

Rails & Pylons

The Ruby on Rails and Pylons web frameworks are more or less comparable. A good chunk of Rails’ core has been ported to the Python webhelpers package, which is used by Pylons (and other Python web frameworks). There doesn’t seem to be any major features in one web framework and not the other. Pylons has in-browser debugging (off by default in production code) and, since it relies on existing, and pluggable, templating, ORM, and other modules, may be slightly more mature and flexible. Rails’ DB migration is more mature than SQLAlchemy’s.

Cool things about Ruby

Ruby’s block arguments have interesting potential, especially if you were writing a heavily thread-based or event-based application. Of course, a traditional MVC web application doesn’t really need threads or events (unless you’re writing a HTTP server too). Most places where I’ve used, or seen examples of, block arguments in Ruby are places where I would have used a list comprehension in Python. In other words, block arguments are far more powerful than their common use case.

Metaprogramming with Ruby clearly takes less code than in Python. I don’t think there’s anything that Ruby does that Python cannot, or vice versa, with regards to metaprogramming. I’ve needed real metaprogramming in Python extremely rarely, and I don’t know if I’d use the metaprogramming in Ruby any more frequently. The examples of metaprogramming with Ruby that I’ve seen (The Poignant Guide’s chapter, or the way ActiveRecord works) would have been doable, in Python at least, by inheriting from a base class and using class attributes on the derived class as configuration variables. So, whatever win that Ruby gets from easier metaprogramming is minor.

Conclusion

Ruby has standard libraries that are so poor the community has provided drop-in replacements. The documentation about the current and future versions of the language is extremely lacking. The core implementation of the language is not competitive with other interpreted languages. And the language itself is full of idiosyncracies and inconsistencies that are neither useful nor lend themselves to cleaner, simpler code. The language is not without promise or potential, but in its current state there is no reason to choose it over a mature, robust language like Python.

I’d like to thank Jeremy Avnet, Steve Hazel, Greg Hazel, and Ross Cohen for their comments and corrections on drafts of this article. Nonetheless, all inflammatory opinions and any inaccuracies are my responsibility. Subscribe here to read any follow-ups to this article.

Read the follow-up: Ruby’s not ready: comments, corrections, and clarifications

  1. Even the Rails documentation at noobkit.org, the official documentation for the official Rails IRC channel, can’t seem to get Unicode support working (scroll down to “3. Go to localhost:3000/ and get ‘Welcome aboard: You’re riding the Rails!’”). []
  2. Perhaps Ruby could also use ...., which would return an even shorter range, not including the beginning or end points. []

The two ugly faces of HTML generation

There are two quite different reasons for implementing HTML generation on a website. The first reason is to insert dynamic content, content that comes from a database or is algorithmically generated, into pages. The second reason is templating; to ensure that standard, site-wide parts of the HTML, such as headers and footers, are pulled from a single source. The goal of the first is to have a dynamic, database-driven site. The goal of the second is to avoid having to edit tens, or hundreds, of HTML files when the site design changes, and to avoid copy-and-paste coding.

Continue reading

Internet, meet Spydentify

spydentify-400x107Spydentify is a new experiment/side project of mine. It fills a niche that I first identified over at the Typophile Type ID Board: people love looking a pictures and trying to figure out what’s in them. The site’s interface is designed to be as addictive as possible, with a neverending, rapid flow of interesting images, big, shiny buttons to click, and instant feedback on your actions. I’m going to add more ego-stroking, viral-spreading and moderation features soon.

The interface also follows the MVC pattern I laid out in this article. It uses one static HTML file, all dynamic data is loaded through XMLHTTPRequest (AJAX, for those of you who speak Web 2.0), and all HTML generation is done via JavaScript manipulation of the DOM. The backend uses Pylons, which gave me a chance to learn Pylons, Paste, Routes, SQLAlchemy, FormEncode, and Mako. And comments are rendered with my own PottyMouth.

I also designed the logo all by myself.

Check it out.

The next big thing, part 1: Resolving the conflict between Model-View-Controller and AJAX design patterns

or, how I learned to stop worrying and love the XMLHTTPRequest…

This is the first part of what will become an ongoing series.

If you’ve built a website in the last few years, most likely you’ve adopted an architecture similar to Model-View-Controller, or MVC. If not, well, either your website is terribly simple, you haven’t had to modify it yet, or your code is spaghetti and you should be fired. Just kidding. (Or maybe you’ve come up with an even better architecture, in which case you should share your insights with us mere mortals.)

In MVC architecture, the model reads and writes data to and from a back-end data-store, and organizes the relational data in a nice, hierarchical fashion to be used by the controller. The view accepts input from the controller and generates output HTML, XML, RSS, JavaScript, SVG, PDF, or whatever you want to send to the user’s browser. And the controller accepts browser input, figures out what to query the model for, and picks which view to use and what data to send it.

figure 1: The traditional MVC architecture.

Continue reading