In this appendix, you’ll get a brief overview of how the web was invented and its subsequent development. You’ll also learn how standards are made by the W3C, why the Web Hypertext Application Technology Working Group (WHATWG) was formed, and the aims behind HTML5. To conclude, we’ll take a brief look at the process behind the other major standard that’s covered in this book: CSS3. None of this information is necessary to use web standards but, like many other human endeavors, web standards are a product of their history as much as they are rational technical documents. An appreciation of the history will help you understand why the standards are the way they are.
In the following sections, you’ll learn about the history of the web, from its beginnings as an easy way to share physics papers to its current incarnation as the repository of all the world’s knowledge and possible replacement for traditional operating systems. You’ll also learn about the World Wide Web Consortium (W3C) and its role in providing the standards on which the entire web relies. You’ll see how web developers have pushed the boundaries of what’s possible with HTML4 and CSS2 to create the need for new standards, and you’ll learn about how many of the common issues that today’s web developers encounter can be solved easily in HTML5 and CSS3.
In 1989, Tim Berners-Lee was thinking about the difficulties scientists at CERN encountered when sharing their papers and research results. Each had tools for writing papers and other documentation on their own computers, but CERN was mostly populated by researchers visiting from the universities that employed them. They brought their own computers with them, so there was a wide variety of different computers, each with unique documents. If you wanted a document from a fellow researcher’s computer, then it was likely you’d either need to learn to use a different computer or program than you were used to, or you’d need to transform the output of your colleague’s software to make it compatible with your own. Berners-Lee had written several of these conversion utilities but realized that, instead of a succession of small utilities, he would be better off solving the general problem. He believed a hypertext system would be ideal, but systems at the time were too complex and difficult to author for. He set about designing a simple hypertext system based on Standard Generalized Markup Language (SGML) for a distributed client-server architecture.
This culminated in the release, on Christmas Day 1990, of the WorldWideWeb browser and server. It allowed each individual to publish their documents in a standard format that anyone else could then read across the network using the browser. The browser didn’t need to be a particular bit of software; anyone was free to implement a viewer. The HTML document format was plain text interspersed with special tags marked by angle brackets, such as <p> for paragraph or <li> for list item, to mark the purpose of the text. These documents could be easily created on any type of computer.
The idea quickly caught on in the academic world, and several more browsers appeared: libwww, Mosaic, Midas, Erwise, ViolaWWW, and Arena, among others. The authors of the various web browsers collaborated on the www-talk mailing list, discussing implementation strategies and arguing about new features. Implementation usually won out over theory—when Marc Andreessen proposed the <img> tag, it was felt by many to be the worst of several proposals put forward. But Andreessen was the first person to implement his proposal, so that was the tag everyone used in their pages, and it’s the tag we still use today.
The primacy of features over standardization threatened to destroy the ideals on which the web was founded before it even really got started—the situation was heading back toward the original state of affairs—documents compatible with only a single client application.
In an effort to stem the tide, Tim Berners-Lee and Dave Raggett produced a draft document in April 1993, “Hypertext Markup Language, Ver 1.0,” and submitted it to the Internet Engineering Task Force (IETF).
The IETF was the standards body that controlled most of the standards relevant to the internet: TCP/IP for network communication; DNS for name resolution, so you can type in an easy-to-remember address like yahoo.com instead of 22.214.171.124; and SMTP for email, among many others. The published standards were known as Requests for Comments (RFCs), reflecting the consensual attitude that marked the growth of the internet over the previous two decades.
The HTML 1.0 draft was overtaken by the rapid development of browsers. In the time it took to move through the standards process, the state of the art in web browsers moved on significantly. But the web was becoming increasingly popular, so the need for some sort of standard was even more acute: HTML 1.0 was soon to be replaced by HTML 2.0.
The first commercially successful web browser was Netscape Navigator. Version 1.0 was released on December 15, 1994 and quickly captured huge market share. It was based on the Mosaic code originally developed by Marc Andreessen.
Also in 1994, the World Wide Web Consortium (W3C) was founded by Tim Berners-Lee. The goal of the W3C was to encourage the adoption of standards across the internet industry, but initially the HTML standard efforts remained focused within the IETF.
In August 1995 Microsoft launched Internet Explorer, also based on the Mosaic code. It was not very competitive with Navigator in features and was quickly superseded by version 2.0 in November 1995.
The same year also saw the launch of Yahoo.com (March 1995), Amazon.com (July 1995), and eBay.com (September 1995), along with many other shorter-lived web brands—or, as they soon became known, dot-coms. The internet boom was ready to happen, and both Netscape and Microsoft wanted to be in position to take advantage of it.
The first official standard for HTML (HTML 2.0) was published in April 1994 with revisions in July 1994 and February 1995; it was finally accepted as a standard by the IETF in September 1995. The goal of the document was to describe common browser capabilities as of June 1994, so it reflected most of the functionality available in the browsers released that year.
By the time versions 3.0 of IE and Navigator were released in August 1996, IE was much closer in terms of features, and the browser wars were on. In an effort to grab market share, both vendors rushed to implement new features with little regard for compatibility. Initially this wasn’t a problem, because Netscape had as much as 80% of the market; but as IE gained ground, thanks to improved features and an aggressive marketing campaign, developers had to contend with two browsers with similar features but very different implementations.
The W3C attempted to stem the tide by publishing a draft standard, HTML3. It wasn’t compatible with either of the major browsers, so it struggled to gain traction. A short-term compromise was reached in HTML 3.2. This more closely reflected the functionality of contemporary browsers. Many of the features proposed for HTML3 were carried forward to the spec for HTML4.
- Working Draft (WD)— The proposed standard may go through several drafts. Once the standard has stabilized, the editor issues a Last Call for comments, and then the standard can move on to the next stage.
- Proposed Recommendation (PR)— The Proposed Recommendation stage lasts at least four weeks. A PR is voted on by W3C members. After the vote, the standard is either returned to the Working Draft stage or, perhaps with modifications, advances to be a full recommendation.
- W3C Recommendation (R)— A Recommendation indicates that consensus has been reached among W3C Members and the specification is appropriate for widespread use. After the standard has become a Recommendation, only minor revisions are allowed to correct minor errors or clarify issues.
In November 1999, an update to the “World Wide Web Consortium Process” document added an additional stage to the process: the Candidate Recommendation. This recognized the need for implementation feedback prior to the standard being published as a Recommendation:
- Working Draft (WD)— The initial publication of the standard, used to gather public feedback. A standard typically has several Working Drafts before advancing to the next stage.
- Candidate Recommendation (CR)— After the specification has stabilized, it becomes a CR. At this point, browser vendors are expected to begin implementing the standard in order to provide feedback about its practicality. It isn’t unusual for a standard to revert to a WD several times after becoming a CR.
- Proposed Recommendation (PR)— After some practical implementation experience has been gained, preferably at least two independent and interoperable implementations, the standard can advance to the PR status. This is an opportunity for final review within the W3C. The standard is either approved by the Advisory Committee and advances to a full Recommendation, or it returns to WD status for further work.
- W3C Recommendation (R)— As before, when published as a Recommendation, the standard is ready for widespread deployment.
After the frantic pace of releases in the second half of the 1990s, things slowed down for HTML. The DOM Level 2 spec was published in late 2000, followed by DOM Level 3 in 2004. CSS saw a major revision to 2.1 in February 2004, but it didn’t see full support in IE until the version 8 release in March 2009.
Still, many people thought the future of the web was not with HTML and CSS. This quote from a Dr. Dobbs article in 2002 is typical: “Even today, HTML offers scant control over design essentials like typography and screen layout, and does little to accommodate complex interactions between browsers and servers. Making a trip to the server after each mouse click is a fairly inefficient way to deliver information. As Web development increasingly focuses on applications, markup’s limitations are becoming more and more apparent.”
Two events heralded a new approach to web applications. First, the Firefox browser, which is the open source descendant of Netscape Navigator, added its equivalent to IE’s XMLHTTP: the XmlHttpRequest (XHR) object. Second, Google launched a web-based email application that took advantage of this feature: Gmail.
You may have wondered what the W3C has been doing in the decade since HTML 4.01 was released. It has, of course, been working on plenty of standards other than HTML, but it’s also working on a replacement for HTML4. The W3C decided that the future of HTML lay in XML. XML is superficially similar to HTML—documents, tags, and elements all exist in XML, but it has two major differences:
- XML parsing is much stricter than HTML. A few mistakes in an HTML document will, in many cases, not even be noticed; the browser will correct the errors as best it can and carry on. A single error in an XML document causes the parsing to fail and an error message to be displayed. The stricter approach allows browsers to be more efficient, which is particularly useful on mobile and low-power devices.
- XML is extensible. If you want to add new elements to your XML page, you can do so. You describe those elements in a separate file and link to it from your document. Your new elements are then just as valid as any specified by the W3C.
The first step was to redefine HTML 4.01 as an XML standard. XHTML 1.0 became a Candidate Recommendation in October 2000. It contained no new elements or features; all the valid elements were identical to those in HTML 4.01. The only changes came from it now being a dialect of XML. The plan was to extend XHTML in a modular fashion by plugging in new XML dialects. Some of the better-known XML dialects the W3C expected to be plugged in to XHTML were Scalable Vector Graphics (SVG), which became a CR in August 2000; and MathML, an XML language for describing equations, which became a CR in April 1998. The modular approach allowed different technologies to be worked on at different paces.
The drive toward XML meant that HTML was largely sidelined. The focus was on building compound documents out of various XML dialects. This included the HTML-like XHTML and the previously mentioned SVG, MathML, but also XForms, RDF (Resource Description Framework), and any number of other proposals. It was envisaged that you might write web applications without using any XHTML at all.
In 2004, at the W3C Workshop on Web Applications and Compound Documents, Opera and Mozilla, concerned that the standards process might become increasingly irrelevant to the web as it existed in the real world, put forward a position paper outlining an alternative approach. This paper outlined seven “Design Principles for Web Application Technologies” and, in the context of these, proposed answers to the questions the workshop had set out to answer.
The document was voted down by the rest of the attendees, who wanted to stick with the current XML, rather than HTML, -based approach. Two days later, the Web Hypertext Application Technology Working Group (WHATWG) was formed.
The WHATWG set out to define the next HTML standard according to the seven principles set out in Opera’s and Mozilla’s document. They underpin the entire approach taken by the WHATWG during the development of HTML5, so let’s look at them now:
- “Well-defined error handling”—A major point of incompatibility in contemporary browsers was not what happened when the page author got everything correct, but what happened when they made a mistake. The next standard should specify error handling and error recovery.
- “Users should not be exposed to authoring errors”—This addressed a major difference of opinion with the XML-based approach at the W3C. WHATWG wanted browsers to recover from errors gracefully and, where recovery was possible, not display an error message to the user—just like HTML.
- “Practical use”—New features should be added based on use cases. Ideally, these should be based on real issues developers experience in working around the limits of existing standards.
- “Device-specific profiling should be avoided”—The W3C produced a cut-down version of the XHTML spec for mobile devices. The WHATWG felt that authors shouldn’t have to produce different versions of their markup for different devices.
- “Open process”—Although the W3C has open mailing lists, it also has private ones. WHATWG activity is conducted entirely under public scrutiny.
This isn’t to say the principles of the WHATWG were entirely orthogonal to those being followed by the W3C’s XML-focused working groups, but there was a significant difference in approach. The W3C continued to work on XHTML2 while the WHATWG worked on HTML5. XHTML2 had the backing of the recognized standards body, but it primarily appealed to people who wanted to use other XML-based technologies. HTML5 garnered far more popular support with its “evolution rather than revolution” approach and its exhaustive documenting of browser behavior.
In addition to the seven principles, the HTML5 spec took the step of combining the separate HTML and DOM specs by the W3C. Experience had shown that trying to maintain them as two specifications led to inconsistencies and incompatibilities. In the HTML5 spec, the DOM became the basis of correct parsing—two implementations would be interoperable if they produced the same DOM tree from an HTML document.
Eventually the W3C realized that it risked being made irrelevant by real-world events. In March 2007, it relaunched the HTML Working Group. Mozilla, Apple, and Opera proposed that the WHATWG HTML5 specs be taken as the starting point of this new group’s work, and the rest of the working group agreed. At this point, XHTML2 was put on hold and everyone was able to agree that the future of the web would be HTML5.
While all this was going on in the world of markup, work was continuing on CSS at the W3C in the form of CSS Level 3, or CSS3 for short. CSS3 also tried to correct a number of past mistakes in drafting specifications, starting with fixing CSS2.
The CSS2 specification had been through the 1998 standards process and thus had no implementation feedback before being published as a Recommendation. As vendors tried to implement it, a number of issues were found that made it impossible, or impractical, to achieve compliance with the standard.
CSS 2.1 set out to rectify those mistakes and provide a solid, implementable base on which to build CSS3. The work to set CSS 2.1 right has taken more than eight years, but was finally completed in June 2011. But the timing of this was unfortunate. IE6 was released in August 2001, a few years after the CSS2 publication but a year before the first draft of CSS 2.1. This is significant because IE6 is the browser that won the first round of the browser wars. It achieved 83% market share by 2004 as Netscape collapsed. With no competition, Microsoft wound up IE development; the web would be stuck on IE6 for many years. In comparison to the two-year-or-less gap between most previous IE releases, it would be nearly five years before IE7 appeared. Even though IE6 had good support for CSS2 compared to other browsers available in 2001, it soon fell behind standards.
CSS3 is modular; it’s split into sections such as Backgrounds and Borders, Values and Units, and Text Layout. This means that instead of waiting years for a huge, monolithic standard to be finalized, as has happened with CSS 2.1, less controversial and more useful sections can be prioritized and pushed through the standards process more quickly. In the meantime, until a particular module is ready, the corresponding section of the CSS 2.1 spec is regarded as the current standard.