Dashes vs. Underscores in URLs
The debate over whether to use dashes or underscores to represent spaces in URLs is rather heated in the web development community, but not quite as extremely so as that of whether to use tabs or spaces when indenting code. So, what is better to substitute for spaces in URLs, dashes or underscores?
The simple answer is that, never mind what Google prefers, underscores are the right way to go. Why?
1) Hyphens Already Mean Something
Hyphens and dashes are actually slightly different, but in practice everybody just uses the same character, ASCII number 45, the hyphen-minus. So let’s just pretend they’re the same. The strongest argument against dashes is that they already mean something in English! “Mother-in-law”, “X-ray”, and “twenty-one” are all single words. Inserting a hyphen in the middle of a sentence can completely change its meaning. You can’t just ignore those rules, any more than you would write without capital letters or proper punctuation. If you use dashes in your URLs when you don’t mean them, you a) lose information about what the content of the URL actually is, b) confuse people, and c) will have the English police at your door by the morning.
For example, I have a file called
man-eating-shark.jpg. Now, can you tell me if it’s a picture of a man eating shark meat, or a picture of an actual man-eating shark? No, you can’t. This example is from Wikipedia, and there are some more great ones on that page. Sure, you can just open the file and see if the shark in question is dead and delicious or alive and ravenous. But when a search engine indexes the file, it has no idea. Not good.
One more that nicely illustrates the importance of preserving dashes: a document called
scientists-discover-three-hundred-year-old-trees.html. Are they three-hundred-year-old trees, three hundred-year-old trees, or three hundred year-old trees? We know neither how many trees there are, nor how old each is. And Google doesn’t know either! If you’re looking for things that are one hundred years old and not three, you’re out of luck because Google can’t tell the difference. How dumb is that?
On the other hand, the following URLs are perfectly clear:
You can’t just ignore centuries of English writing. Don’t use dashes in URLs when you don’t mean them. Doing so is the worst kind of wrong: grammatically incorrect.
2) Aesthetics and Readability
I’ve appealed to those who care about language, but if you happen not to, that’s okay. Underscores are still better, because they look better. Dashes are all up in your space (haha), next to the letters that you actually want to be looking at. I would rather honestly read something in Comic Sans than have distractions between every single word, firing all the wrong photons into your eyes and making them sore.
Underscores? Not as good as spaces themselves, but certainly a huge improvements over dashes. They’re a bit larger than spaces (if the typeface is not monospaced), but at least they rest comfortably at the bottom of the letters, and it should be no harder to read than something underlined (in fact, underscores are underlines — more on that later). In order of decreasing readability:
- The five boxing wizards and the quick brown fox jumped over the lazy dog.
The last one looks terrible. Depending on the person, and the typeface, this may not always be the case, but in general the underscore is far superior from a readability standpoint.
3) The Semantics of the Underscore
So you’re a web developer. Hopefully you care about language or readability, but perhaps not. But you’ve got to care about semantic correctness.
It’s not common knowledge, but the underscore isn’t really a character like the rest are on our keyboard. It’s only there because of a little piece of tech history known as the typewriter. In order to underline text, you had to write it out normally, then move the typewriter carriage back and go over it again with underscores. That means that an underscore character all on it’s own is basically an underlined space — which is pretty much as close to an actual space as you can get on the web.
The Better Answer
It should be obvious, but in a sane computer ecosystem we wouldn’t have to use either! We should not have to compromise on any of these three points, where the underscore is only the next-best option. A space should be a space, no matter where it is. Our filesystems themselves work fine with spaces in filenames, so why replace the space at all? Unfortunately, in the beginning a whole bunch of geeks decided it was a good idea to make the space (of all characters) the delimiter for filenames in things like command-line arguments and URLs, etc. There’s a prettygoodreasonweusespacesinwriting. Likely, they simply didn’t anticipate the masses ever using the software they were writing.
Of course, quotes or some other non-space delimiter should have been required from the beginning. The “use spaces, or else fall back on quotes” system is just silly. You have to use some delimiter, of course, but not spaces, because we use them more than any letter in the alphabet. We’ve all felt the pain of having a file called “my document.txt” and messing up your escapes and seeing “No such file or directory” on the command line. It’s the same wicked convention that is the reason we can’t use spaces in URLs.
As it is, the crazy idea that a space should represent a space and not some other character is pretty impractical. If you do try to use them, a browser encodes them as “%20”, which%20makes%20your%20URLs%20look%20like%20this. Technically, they’re encoded spaces, but it’s hideous. You can barely read it. Oh well, maybe one day this will change.
So, underscores are not quite as good as spaces. They’re a compromise of language, readability, and semantics, but they’re the best we’ve got. Better than dashes, CamelCase, plus+signs, or anything else. So use them.
(Dash-underscore face is relevant to this discussion.)