Side Effects of Hash-Bang URLs

 Posted by at 11:07 am  Web
Feb 092011

Mike Davies recently posted a scathing review of everything Lifehacker did wrong that led up to their recent site-wide outage. The root culprit, according to Davies: converting their site from using normal URLs to address content pages to using hash-bang URLs instead.

I have no opinion on whether hash-bang URLs are the new root of all evil, but I do have a few observations about hash-bangs that weren’t mentioned by Davies.

As Davies mentioned, a hash-bang URL is a URL that contains a URL fragment, set off from the rest of the URL by the # character, such as a Twitter profile URL:!/danny_thorpe.  Davies delves into the Google origins of the hash-bang pattern intended as a means to help search engines index dynamically loaded AJAX web sites and how changing from a “normal” URL to a hash-bang URL impacts search engine web crawlers and page indexing.

URL Cache Equivalence Ignores Fragments

The hash-bang URL pattern also has significant impact on the browser client side of the equation. The browser does not consider differences in URL fragments significant when comparing two URLs for cache equivalence.  The browser will consider “!/danny_thorpe” to be URL equivalent to “!/somebody_else” when deciding whether to fetch the URL content in the browser cache or start a new network request to fetch the HTML page from the server.

This can be used to improve page loading time.  Put all of your static content in the HTML page at the URL base address (eg and let the URL fragment specify the dynamic content that needs to be loaded by JavaScript code to fill in the content areas of the static HTML.  In this scheme, hopping around between different twitter profile pages should only need to download the static HTML content once and only download the actual dynamic content for each user.  After the first profile page loads the static HTML content into the browser’s local page cache, visits to other profile pages should all load from the cache.

When your static HTML content makes up the bulk of your page and the dynamic content is an accessory or minor component, this scheme should make for pages that load faster in the browser because the static HTML content is usually already in your local cache.  If each twitter profile were on a “normal” URL with no URL fragments, the browser would have to load the static HTML content for every profile page, and each of those pages would be stored separately in the browser cache.

Fragments Leave No Footprints

Another side effect of using URL fragments is that the fragments don’t appear in the browser history.  As a result of the URL equivalence rules ignoring URL fragments, multiple URLs that differ only in fragment will only appear as one entry in the browser history.  This makes sense if you are using URL fragments as they were originally intended – to reference specific subsections of the same base page.

We used this artifact to our advantage when implementing secure cross-domain communication via URL fragments – we could pass data in the URL fragment multiple times without polluting the browser history.

When URL fragments are used to specify different content this browser history “singularity” becomes a problem – hopping between URLs with the same base but different fragments (to see different twitter profile pages, for example) leaves only one URL in the browser history.  URL fragments leave no footprints.

If you try this in the Twitter web UI, you’ll find that visiting multiple profiles actually does create entries in your browser history.  I’m pretty sure that happens only because the Twitter pages are injecting entries into the browser history using JavaScript.

While I’m not sure I agree with all of Davies rants against LifeHacker, I will agree that building an entire content system solely on the hash-bang URL pattern with no appreciable static HTML content independent of JavaScript is probably not the best use of the hash-bang URL pattern.