XHTML :: How to succeed

Patrice Levesque
Edimaster
  • Français
  • English

XHTML :: How to succeed

Foreword

From its infancy at the beginning of the nineties, the World Wide Web (WWW) has interested all writers. Since then, anywhere, anytime, anybody with only bits of know-how can publish on the WWW. This ease of access has allowed billions of web pages to appear on the network, any way.

HyperText Markup Language (HTML) - the basic web language - needs very simple syntax and vocabulary but nonetheless invalid markup accounts for more than 90% of the web pages found on the internet. For many reasons, the Internet Explorer web browser favors this poor HTML quality and even encourages a state of status quo: it displays without any error message badly formed HTML documents, correcting as it feels the deficiencies of the said document, bringing the web artists to a mediocrity exercise (the problem comes from the other web browsers, all renders fine with IE...). This page targets authors who use the web as a medium and that want to know how to enhance the quality of their work while allowing 100% accessibility for contents - and at least good presentation - with the major web browsers.

Fall 2006, the most advanced browser when comparing conformity and standards respect is called Mozilla Firefox (though other Mozilla projects share the title). As all browsers must one day or another be as standard compliant as them - and beyond -, this document takes as granted what is offered by Mozilla and Firefox and gives means to keep compatibility with not-yet-there browsers.

Why choose XHTML

Contents and presentation

First rule to change the world: separate contents from its presentation. Mid-nineties HTML forced authors to mix in a same document colors, text, background images, spacings, figures, and so on. Though CSS can be linked to modern HTML, other stylesheet types - like XSL - are designed for XML languages like XHTML so let's be prepared.

Experience has shown that mixing together contents and presentation brings a number of disadvantages and frustrations. These seem to stand out:

Maintenance difficulties
In contrast with conventional media, a web page can evolve as the author wishes. When contents and presentation share the same space, a simple modification like changing all link colors becomes a nightmare; hunt them one at the time, too much of a burden.
Difficult collaboration
Sometimes an author would like to dispatch all visual aspects to a designer. If a file holds both contents and presentation, parallel work becomes a hazard.
Impossible reuse
For a collection of pages, often the need for a similar presentation arises. To copy the same presentation information from page to page: a frustrating error-prone work. And then the background color must change for all pages and the nightmare comes back.
Doubtful techniques
To control some aspects of presentation in web's infancy, neat tricks were invented, but bringing with them bad habits. Tricks like using empty transparent images to move text. Using proprietary tags that don't work in other browsers. Turning bugs in browsers to one's advantage. These methods, though bright and useful in their times, are now obsolete and put breaks on both structural elegance and coherence. One must keep in mind that smart tricks become useless and dumb when The Right Way(TM) produces the same results with half the pain.
Limited public
For some people, presentation can become an obstacle. A person with weak vision might want to read a page but not be able to do so because of a too small font size. A blind person must yawn to death, listening to a web page with dozens of sentences related to what he'll never see. Though most people can use the web without any problem - and though most of the times contents are prioritized second to flashy animations - to offer everybody a chance to live a quality experience separates great authors from the chaff.

So, with exclusive usage of Cascading Style Sheets (CSS) for presentation, all these annoyances disappear, with a greater flexibility and control than what is offered by simple presentational HTML..

XHTML gives good habits

eXtensible HyperText Markup Language (XHTML) became an official recommendation in 2000; essentially a HTML reformulation under the eXtensible Markup Language (XML) rules. XHTML asks for more rigor, but still remains pretty close to HTML. One of the greatest advantages of using XHTML: built on XML, every XML tool becomes available, allowing a broad range of manipulations of the document's data. These transformations easily allow sharing data with databases, configuration files, Web Services (SOAP, XML-RPC) and lots of other XML-based languages..

It then becomes a lot easier to transform a web document to other medias like cell phones, WebTV, traditional printing and PDAs. Information becomes free of its support.

Syntax of XHTML

Differences between HTML and XHTML

To build a XHTML document, all the below rules must be precisely followed:

Bones of a XHTML document

Here is shown a sample XHTML document, a simple wrap-up of the few constraints given above. Line numbers refer to the most likely explanation for that line.

[1] <?xml version="1.0"?>
[2] <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
[3] <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
[4]   <head>
[5]     <title>Page 1</title>
[6]     <style type="text/css" media="screen">
[7]       <![CDATA[
[8]         p { color: black; background: white; }
[9]       ]]>
[10]     </style>
[11]   </head>
[12]   <body>
[13]     <p>Hello!</p>
[14]   </body>
[15] </html>

This document must be sent with proper headers!

Get around the past

Since the fall of Netscape around 1998, Internet Explorer dominates the browser market without trace of a doubt. Microsoft, owner of Internet Explorer, did not enhance its product since: no market share is to be gained anymore. In the shade, lots of other browsers carried on incorporating new techniques, becoming one after the other more compliant than Internet Explorer to the W3C recommendations. Some of these new methods are not supported with IE; in this section will be explained tricks an author can use to work around the obsolete IE while remaining at the bleeding edge of the new web medium possibilities. Other tips for the other browsers, too, as none is 100% compliant (not yet!)

Images

In their pocket, nineties authors could use two file formats for images: Joint Experts Picture Group (JPEG) and Graphics Interchange Format (GIF). JPEG offers a lossy compression format, allowing up to 16777216 (224) colors; perfect for photographs. GIF, though, only support 256 (28) colors, but the image suffers no loss and even a 1-level transparency is built-in, so that images are not forced to take the rectangular shape typical of the JPEG format.

How then can be shown a non-lossy image with more than 256 colors? Can more than one level of alpha be available to allow transparency effects? a new fileformat was born in the mid-nineties.

The PNG format

Becoming a W3C recommendation in 1996, the PNG format holds a big number of advantages over the GIF format. The number of colors is not limited to 256 and the encoding adapts to the real number of colors used; up to 16777216. Just like the GIF format, the PNG format allows no loss. A 256-level alpha channel, allowing up to 256 transparency levels, is also provided. The format also supports gamma values, allowing faithful restitution of colors on every computer and monitor. Other important point: the file size is generally reduced when comparing to GIF; so no good reason remains to use GIF because the PNG format extends all the GIF specifications.

PNG is partially supported by Internet Explorer; the alpha channel is not spontaneously applied. There exists a method for Internet Explorer versions 5.5 and higher to display images with an alpha channel, though this trick will work only on <img/> tags and on background[-image] CSS directives. The workaround uses the non-official CSS directive called filter, using a proprietary command: AlphaImageLoader. So, to display a PNG image, instead of using:

<img src="foo.png" />

Internet Explorer needs:

<img src="blank.gif" style="width:100px; height: 64px; filter: progid:DXImageTransform.Microsoft.AlphaImageLoader(src='foo.png', sizingMethod='scale');" />

The "blank.gif" file points to a 1 pixel by 1 pixel transparent GIF image. Size of the PNG image must also be specified, using a CSS directive so that the image is shown at the right size. For background images, instead of using:

<div style="background: url(foo.png);"></div>

One must extend the directive:

<div style="background: transparent; filter: progid:DXImageTransform.Microsoft.AlphaImageLoader(src='foo.png', sizingMethod='scale');"></div>

A number of scripts that dynamically modify the <img> tags can be found, wrapping PNG images into dogfood for Internet Explorer; unfortunately their reliability seems to fall short (!!!). To server-side detect the browser (and ship a different document) seems a lot more safe.

When using this trick, hypertext <a> links sometimes break; some combinations of tags bring Internet Explorer into a state of purposelessness.

Also note that the background-color and background-image combination will not work as expected with Internet Explorer; the recommendation insists that the background color should be seen though a semi-transparent image - which IE will not do, using a syntax that would prevent it anyway.

The XHTML tags.

Though the tag set of XHTML resembles a lot to the HTML set, some differences are known and rendering can be altered depending of the document type. This section explains the differences between the two groups.

<a>

The tag that literally created the web could contain a target attribute in the past, allowing a new window to be opened on a hyperlink click (using target="_blank"). The target attribute has been removed since XHTML 1.1 and though opening new windows breaks the navigation and frustrates the visitor (as he can open new windows himself if he wants), there exists a way, thought ECMAScript, to get the same effect.

To open a new window on a hyperlink click, the onclick attribute must be used - so that people using a mouse can get a new window and those who surf with a non-graphical browser (and robots) can still follow the link.

So, instead of <a href="http://example.com/" target="_blank">, <a href="http://example.com/" onclick="window.open('http://example.com/', '_blank', ''); return false;"> will work in XHTML 1.1. The return false; code prevents clicks from replacing the current page.

<body>

Mandatory tag, it fills the whole page space in HTML but in XHTML the <html> element fills that function instead. So - by default - a browser could add margins to the <body> element or padding to the <html> element.

To get the same page look when switching from HTML to XHTML, one must add this to the CSS:

html { margin: 0px; border: 0px; padding: 0px; }
body { margin: 0px; }

Internet Explorer displays the page scrolling bar according to the <body> element, meaning that maybe a page where <body> does not entirely fill the browser space will not yield the expected result.

<fieldset>

This tag allows grouping of <form> elements in a structurally coherent way.

The Opera browser will not let you remove the border around the <fieldset> element. To remove the border effect, one must use a border color the same as the parents' background color.

<map>

To allow a clickable image, in which different parts go to different pages (a road map for instance), the <map> and <area> tags existed as early as in HTML 3.2. The code looks like this:

<img src="image.png" alt="Canada" usemap="#mymap" width="50" height="100" />
<map id="mymap">
  <area href="section1.html" alt="Route 20" shape="rect" coords="0,0,49,49" />
  <area href="section2.html" alt="Route 35" shape="rect" coords="0,49,49,99" />
</map>

A browser which doesn't show images will display a list containing the alt attribute contents, so that accessibility remains possible.

With XHTML 1.1, the usemap attribute of the <img/> tag doesn't accept URIs anymore. One must only write the name of the destination anchor, without the "#" prefix. As expected, this new method breaks compatibility and is supported by only few browsers. To downgrade to XHTML 1.0 for pages containing usemap attributes seems for the moment the one true way.

<object>

Suppose a new graphical format is released (called xyz), with oh-so-great features, and that only one browser supports it. To use it on a web page set, a script detects the browser and sends a more common image format to browsers that don't understand the xyz format. A second browser, a few weeks later, can read this xyz format. The logical next step is changing all the web pages to detect this second browser. Absurd. The <object> tag works around this by proposing alternatives - even text; a list of "objects" can be chained and the browser, following the proposed order, will pick the first "object" that it supports and renders it. A neat idea.

Internet Explorer does not behave like this on an <object> tag. For this browser, each <object> tag needs that the user agrees to ActiveX scripts, even if the forementioned <object> content will not use them. Even more, blindly, Internet Explorer displays all the possible alternatives, wraps the <object> in scrolling bars and a border. How to work around it? Knowing that the user has not disallowed ActiveX, a script can be added to the webpage to remove scrolling bars and borders:

function objectfix () {
  var objects = document.getElementsByTagName ('object');
  for (i = 0; i < objects.length; i++) {
    var o = objects[i];
    if (((o.type == "image/jpeg") ||
    (o.type == "image/gif") ||
    (o.type == "image/png"))) {
      o.body.style.border = 'none';
      o.body.style.margin = '0';
      o.body.style.padding = '0';
      o.body.style.overflow = 'hidden';
    };
  };
};

window.onload = objectfix();

To prevent IE from displaying all alternatives, one must (again) server-side detect the browser. Nothing hints at a bugfix for IE.

<script>

Allowing to spice up a web page with ECMAScript and Javascript/JScript, this tag must not use the obsolete methods like document.write to alter a web page. Because a XML is seen as an information tree, one must - using the DOM - add nodes to the document.

Most browsers don't accept the short-hand notation for the <script> tag. So, <script type="text/javascript" src="foo.js" /> will not work, but <script type="text/javascript" src="foo.js"></script> will. From a XML standpoint, both forms are equivalent, but it seems that lots of browser engine programmers still don't get the idea.

The Opera browser, when in XHTML mode, will not parse this tag until its 7,5 version; one must present another mode (typically text/html) to Opera if scripting is needed.

Style sheets (CSS)

Cascading Style Sheets (CSS) appeared at the end of the nineties and allowed a first separation between contents and presentation. With simple directives (purely declarative), presentation of a HTML document can be changed without ever touching that document.

The block model

Since the first version, CSS1, to every element can be associated a margin, a border, a padding, a width and a height.

Internet Explorer (in its "compatible" mode, the "strict" mode of IE6 finally resolving the issue) differs from the other browsers by its understanding of the width and height properties; for IE, the width and height contain both border and padding, though the recommendation specifies that these three entities don't overlap.

To make the block model compatible with all browsers, one must add a second block in the faulty block and add paddings and borders in that one instead. Ugly because one must change the XHTML code (which should never be used for presentation). Another workaround consists in writing faulty CSS code to exploit parser bugs in IE. Even uglier..

Fixed positioning

One of the greatest achievements of CSS2: allowing the position of elements to be specified precisely. Pixel for pixel, relatively positioned or the good old way, every type of layout has its favorite method. There even exists a method that allows positioning fixed elements, so that scrolling the page don't move them.

The position: fixed CSS2 directive is correctly computed by most browsers, though Internet Explorer does not. To get the same effet, one must use javascript code - that code checks if the page scrolled and replaces the element to its intended position. As an example case, this code makes sure the element with the "menu" id stays on top of the screen. Just calling the init() function on page load suffices:

var menu; var theTop = 0; var old = theTop;
function init() {
  menu = document.getElementById('menu');
  movemenu();
};
function movemenu() {
  if (window.innerHeight) {
    pos = window.pageYOffset
  }
  else if (document.documentElement && document.documentElement.scrollTop) {
    pos = document.documentElement.scrollTop
  }
  else if (document.body) {
    pos = document.body.scrollTop
  }
  if (pos < theTop) pos = theTop;
  else pos += 0;
  if (pos == old) {
    menu.style.top = pos;
  }
  old = pos;
  temp = setTimeout('movemenu()',100);
}

Another method, that could work in some cases, is described by Simon Jessey.

Conclusion

Microsoft has gained the largest browser market share by its bundling with the Windows operating system. Though this inclusion allows anyone who installs Windows to get a browser without any hassle, it somehow prevents competition with more adapted and clearly more advanced browsers. Great browsers like Mozilla (and its derivatives Netscape, Firebird Firefox, Galeon, Epiphany, K-Meleon, Camino, ...), Opera, Konqueror (and its cousin Safari from Apple) are all waiting for their trial, integrating one by one the new W3C recommendations. The web evolves and Internet Explorer refuses to follow suit.

The computer world progresses by giant steps and applications that refuse change don't usually last long; Internet Explorer has accumulated a considerable lag - don't let the WWW become stiff in front of all that great future.

A noble experiment by Dean Edwards to work around Internet Explorer deficiencies is called IE7. His compatibility module offers lots of solutions to the problems shown above. Let's hope his great work continues, allowing more and more authors to embrace the standards and clean-up this so messed up web.

All comments, corrections or suggestions should be sent using this form. It'll be a pleasure to improve this document for community's well being.


XHTML:: How to succeed

Creation : August 30th, 2003
Villeray
N 45° 33′ W 73° 36′

XHTML:: How to succeed

Last update : November 21st, 2006,
Villeray,
N 45° 33′ W 73° 36′