Tag Archives: Unicode

Rant about Ruby 1.9’s strings

What other language requires you to understand this level of complexity just to work with strings?!

Unicode in five minutes

Someone over at Linden Labs is a bigger Unicode geek than me. And that’s really saying something.

What isn’t new in Ruby 1.9.1 (or in Python)

Like Josh Haberman, I was excited to see the changelog for Ruby 1.9, but immediately disappointed by its vagueness and terseness.

This list is meaningless to anyone who isn’t already familiar with the changes that have been happening in Ruby 1.9.x.

For someone like me who tried an older version of Ruby, there’s nothing to read that will tell me whether it’s worth checking out again.

Take this example, from the changelog:

IO operations

Many methods used to act byte-wise but now some of those act character-wise. You can use alternate byte-wise methods.

That’s terrifying. If I’m switching to a new version, I need to know exactly which methods have changed and which ones haven’t. Saying that “some” have changed is almost less helpful than saying nothing at all.

Here’s hoping that “improved documentation” will make it into a future Ruby 1.9.x release.

In the same blog post, Haberman makes some inaccurate assertions about Python’s encoding support:

Python has taken an “everything is Unicode” approach to character encoding — it doesn’t support any other encodings. Ruby on the other hand supports arbitrary encodings, for both Ruby source files and for data that Ruby programs deal with.

Incorrect. For the last five and a half years, since Python 2.3, source code in any encoding has been supported, and Python 3.0 will expect UTF-8 by default. And of course, Python supports exactly the same wide range of encodings for data. Python’s approach can best be described as “Unicode (via UTF-8) is default.”

SuperCache not so much with the “Super”

In reaction to the more than 34,000 visitors that Why your Flash website sucks generated (and is still generating) Jeremy enabled WordPress’s SuperCache for me.

Turns out SuperCache does a less than super job with non-ASCII characters, failing to encode them to HTML entities, and instead writing them out as multi-byte UTF-8 sequences:

Another Unicode omission

Why isn’t the play/pause/stop/record family of icons in Unicode? It includes the four suits on modern playing cards, every mathematical symbol ever used, and a number of now obsolete languages. Shouldn’t it also include what is a ubiquitous set of about eight icons used on everything from 8-tracks to iPods? There’s even a nice spot for them, right next to ⏏ (eject).

This has been discussed at least twice on the unicode mailing list, but no decision was reached either time. It seems that some people think the icons should be included because they would be useful, and others think they should be excluded because it would open the door to including every pictogram and icon in use on every toolbar and consumer electronic in existence. The latter seems like a very flimsy argument to me; the argument for including them stems from their ubiquity, which would prevent every other symbol in use from piggybacking their way into the character set. In fact, the only symbol more ubiquitous is probably the “power” symbol; it should also be included. And most of the symbols already included in the “Miscellaneous Technical” chart are far more obscure.

Update 2007-05-10: The unicode people graciously responded to my email regarding this, and it sounds like it would require an evangelist to encourage the inclusion of these symbols before it happened. It’s so awesome when a standards organization actually responds to the needs of its users.

glyphobet • глыфобет • γλυφοβετ

musings over a tuna fish sandwich