HTML Take 5 and . . . Action!

In 1989 when Time Berners-Lee proposed a new Internet-based hypertext system (HTML) he helped introduce one of the primary building blocks of the World Wide Web. Markup languages had already been in wide use for some time; they allowed content producers to attach hidden “tags” to their content that denoted it’s intended use. By agreeing on the standards of use and a pre-defined catalog of acceptable tags, different computer applications and systems could share information much more easily. Berners-Lee’s great insight was to adapt an existing markup language, in this case SGML, to include new features available only though the Internet. Besides just including such sections as headers and paragraphs, content could now include hyperlinks – directions to other resources on the Internet. Suddenly, new applications called browsers sprang up that allowed a person to view any content that had been marked up with HTML and follow links from one page to another and from site to site – creating a web, the World Wide Web, of information.

HTML should go down as one of the most important innovations of the modern age, yet it’s not without it problems and detractors. Today there is a big push towards the latest version, HTML5, but what does that mean for you?

Monolithic Legacy

It may surprise you to know that version 4 of HTML is nearly 15 years old. In Internet time that is almost prehistoric. Part of the reason for this is that the standards bodies that regulate it nearly abandoned HTML in favor of its cousin XHTML back in 2000. XHTML, while very similar to HTML, was based on XML rather than SGML. From a content producer’s point of view there wasn’t a huge difference between the two: XHTML was just more strict in how tags were formatted and was case sensitive. Even XHTML however hasn’t seen a standards update since 2001.

What this means is that almost the entire web is build using a decade-old standard. Billions and billions of pages of content are marked up using HTML 4 or XHTML. Migrating all that content onto a completely new standard is not virtually impossible it is just impossible. HTML5 gets around this by merging both HTML 4 and XHTML, supporting both in as completely backwards compatible way as possible.

Content v. Style

Surprisingly, the first battle fought over HTML can be represented by one of the simplest tags: <i> for Italic. The cause of this dispute goes way back into the ancestry of HTML, beyond even the roots of its parent language SGML. The Standard Generalized Markup Language (SGML) was introduced in the 1960’s and was itself an expansion of IBM’s Generalized Markup Language (GML). GML was basically a set of macros and tags for use with one of the most popular of the early word processor programs: SCRIPT/VS. Even SCRIPT/VS, as a word processor program, inherited a legacy of terminology and standards based on the rules of typesetting and the printing press. Typography in the age before computer word processors had a very limited number of options for “decorating” text. You still see these today as core options in any text formatting program: bold and italic (and to a lesser extant, underlined).

When HTML was introduced, it seemed natural to include tags that matched these text options such as <i> for italic. So what’s the problem? You see, “italic” does not describe the content that it marks, rather it describes how it should be displayed. This is an important distinction. Every browser or computer system that tries to work with a markup language, such as HTML, must interpret the code. The tags and content are parsed and the system tries to determine the intended use of each element. But in reality, content producers were sometimes using <i> to represent content denoted for specific emphasis and at other times as a choice in typesetting without specific emphasis. HTML does not strictly define how to present each type of content (emphasis could be displayed using a different color or font size for example) so browser developers had to decide for themselves whether to treat <i> as content with emphasis or only as angled text.

The first step to resolve this was to add new tags that actually did mark up content’s meaning: <i> was replaced by <em> for Emphasis and <b> with <strong> for Strong Emphasis. This only went so far however. You see, another problem with HTML was that there were not enough tags to allow content producers to organize and format their web pages in the manner they preferred. Not wanting to repeat the earlier mistakes, the standards body introduced the concept of Style Sheets. These descriptors, the most popular of which is Cascading Style Sheets (CSS) allow content producers to describe the visual (and in some cases audible) style of content. Style tags can even be defined for specific device mediums, so a page might look one way on a monitor screen, but another way completely when printed. HTML5 inherits these advancements and goes even further by actually dropping altogether some style-based legacy tags.

Flash . . . Bang

In reality most content producers are completely happy to continue to develop in either HTML 4 or XHTML. With the implementation of more standardized scripting implementations even very complex user interfaces can be designed. Special scripting tools collectively termed AJAX allow web pages to act more and more like desktop applications. The real driver for HTML5 implementation is summed up in one word: Flash. Adobe’s ubiquitous multimedia development and display tools, including the Flash browser plug-in have been providing web content producers with the ability to embed video content, animation, and even games into web pages for years.

Personally, I don’t mind Flash, but its detractors have a few good points against it:

  • While it has a huge installation base, Flash still requires, a separate program or plug-in to run.
  • Because it is a third-party implementation it could introduce security or stability problems. Apple in particular claims that most of its user-generated complaints are due to problems with Flash.
  • Content marked up in Flash is invisible to search engines. At one time it was the rage to create cool animated navigation buttons on your site using Flash, but developers soon realized that this meant pages of their site linked in this manner didn’t show up in search results.
  • Flash is built around a mouse pointer model that does not translate well to touch-screens.

Flash however, can do things that were too expensive, too buggy, or just impossible even with extensive use of dynamic scripting. Until HTML5 that is. By standardizing video and multimedia elements into the markup language itself content developers could avoid Flash altogether yet still keep most of the multimedia elements and keeping the content visible by search engines.

Mobile Implications

Outside of Android, virtually none of the stock browsers that come with modern smart-phones support Flash. Some mobile OS developers claim to be working on it but others, such as Apple, state with absolute clarity that Flash content will never be available natively in their mobile devices. In addition, dynamic scripting controls on mobile devices are not nearly as robust as on the desktop. HTML5 on the other hand, while not fully implemented, is rapidly becoming the designated standard for multimedia content and dynamic mobile user interfaces. All of the major mobile OS manufacturers either support, partially support, or have promised to support HTML5.

Right now, the smart-phone mindset is focused primarily on “Apps” but once HTML5 comes into it’s own many content developers may find reasons to markup their content with HTML5 instead:

  • HTML5 content would be web-based and could be hosted on virtually any web server. Content producers wouldn’t be limited by the restrictions and potentially draconian control of a regulated App Store.
  • Implementation would be more standardized allowing for much quicker development of an application across many mobile devices
  • Development would be done using knowledge of HTML5, a standard rooted in decades of web development, rather than in more complicated programming languages allowing easier development, support, and updating.

The Catch . . .

Of course, there would be a catch. HTML5 has many:

  1. It is not fully implemented yet. The most common desktop browser, Internet Explorer, only recently included some HTML5 support into version 9, which has just now been released and could take years to reach wide audiences. Mobile browsers are even further behind.
  2. There is a problem with deciding on a video standard. There’s not enough room here to go into the specifics, but browser manufacturers are taking sides on which video compression algorithm (codec) to support. Right now, the option pulling ahead in the debate (based on browser support) is technically a proprietary format that could in the future implement use restrictions or require royalties payments.
  3. It took years and years for all the major browsers to actually begin to implement the HTML, CSS, and scripting standards to the point where web developers could be reasonably sure that content produced would look and work the same across all applications. Already despite the HTML5 standard which is designed specifically to standardize the implementation, the browsers that do support HTML5 have either taken shortcuts and implemented it only partially, or decided on their own that their way would be better.

HTML5 has a long way to go, but with any luck, once it gets here we won’t have already moved on to something else. Remember WAP anyone?

Follow me on twitter or Add me to a Google+ Circle

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: