XHTML :: How to succeed

Foreword
Why choose XHTML
- Contents and presentation
- XHTML gives good habits
Syntax of XHTML
- Differences between HTML and XHTML
- Bones of a XHTML document
Get around the past
- Images
  - The PNG format
- Tags
  - <a>
  - <body>
  - <fieldset>
  - <map>
  - <object>
  - <script>
- Style sheets (CSS)
  - The block model
  - Fixed positioning
Conclusion

Foreword

From its infancy at the beginning of the nineties, the World Wide Web (WWW) has interested all writers. Since then, anywhere, anytime, anybody with only bits of know-how can publish on the WWW. This ease of access has allowed billions of web pages to appear on the network, any way.

HyperText Markup Language (HTML) - the basic web language - needs very simple syntax and vocabulary but nonetheless invalid markup accounts for more than 90% of the web pages found on the internet. For many reasons, the Internet Explorer web browser favors this poor HTML quality and even encourages a state of status quo: it displays without any error message badly formed HTML documents, correcting as it feels the deficiencies of the said document, bringing the web artists to a mediocrity exercise (the problem comes from the other web browsers, all renders fine with IE...). This page targets authors who use the web as a medium and that want to know how to enhance the quality of their work while allowing 100% accessibility for contents - and at least good presentation - with the major web browsers.

Fall 2006, the most advanced browser when comparing conformity and standards respect is called Mozilla Firefox (though other Mozilla projects share the title). As all browsers must one day or another be as standard compliant as them - and beyond -, this document takes as granted what is offered by Mozilla and Firefox and gives means to keep compatibility with not-yet-there browsers.

Why choose XHTML

Contents and presentation

First rule to change the world: separate contents from its presentation. Mid-nineties HTML forced authors to mix in a same document colors, text, background images, spacings, figures, and so on. Though CSS can be linked to modern HTML, other stylesheet types - like XSL - are designed for XML languages like XHTML so let's be prepared.

Experience has shown that mixing together contents and presentation brings a number of disadvantages and frustrations. These seem to stand out:

Maintenance difficulties: In contrast with conventional media, a web page can evolve as the author wishes. When contents and presentation share the same space, a simple modification like changing all link colors becomes a nightmare; hunt them one at the time, too much of a burden.
Difficult collaboration: Sometimes an author would like to dispatch all visual aspects to a designer. If a file holds both contents and presentation, parallel work becomes a hazard.
Impossible reuse: For a collection of pages, often the need for a similar presentation arises. To copy the same presentation information from page to page: a frustrating error-prone work. And then the background color must change for all pages and the nightmare comes back.
Doubtful techniques: To control some aspects of presentation in web's infancy, neat tricks were invented, but bringing with them bad habits. Tricks like using empty transparent images to move text. Using proprietary tags that don't work in other browsers. Turning bugs in browsers to one's advantage. These methods, though bright and useful in their times, are now obsolete and put breaks on both structural elegance and coherence. One must keep in mind that smart tricks become useless and dumb when The Right Way^(TM) produces the same results with half the pain.
Limited public: For some people, presentation can become an obstacle. A person with weak vision might want to read a page but not be able to do so because of a too small font size. A blind person must yawn to death, listening to a web page with dozens of sentences related to what he'll never see. Though most people can use the web without any problem - and though most of the times contents are prioritized second to flashy animations - to offer everybody a chance to live a quality experience separates great authors from the chaff.

So, with exclusive usage of Cascading Style Sheets (CSS) for presentation, all these annoyances disappear, with a greater flexibility and control than what is offered by simple presentational HTML..

XHTML gives good habits

eXtensible HyperText Markup Language (XHTML) became an official recommendation in 2000; essentially a HTML reformulation under the eXtensible Markup Language (XML) rules. XHTML asks for more rigor, but still remains pretty close to HTML. One of the greatest advantages of using XHTML: built on XML, every XML tool becomes available, allowing a broad range of manipulations of the document's data. These transformations easily allow sharing data with databases, configuration files, Web Services (SOAP, XML-RPC) and lots of other XML-based languages..

It then becomes a lot easier to transform a web document to other medias like cell phones, WebTV, traditional printing and PDAs. Information becomes free of its support.

Syntax of XHTML

Differences between HTML and XHTML

To build a XHTML document, all the below rules must be precisely followed:

XHTML header: not text/html anymore

As XHTML differs from HTML, its header must differentiate itself. Three equivalent choices are recommended.

Illegal Legal
Content-type: text/html Content-type: text/plain Content-type: application/xml Content-type: application/xhtml+xml Content-type: text/xml

Most of the times, headers are sent by the webpage server, picked according to the file extension. Usually this way of doing does not bring enough flexibility; dynamic web languages allow header overriding as needed.

From XHTML 1.1, the text/html header must not be used anymore; still permitted for XHTML 1.0, though.

Internet Explorer shows the XML tree when it receives one of the three above headers. For this browser, text/html is needed at all times. The Google robot, too, experiences difficulties if the document is not served with text/html; the document is not parsed correctly and on result pages Google will show "File format: Unrecognized". Netscape4, which doesn't know anything about XML, will show a cryptic error message or offer to save the file - instead of displaying it. For these three cases, one must detect the browser and send the "correct" header.

Some versions of Konqueror don't know how to handle text/xml nor application/xml; as the three XML headers share the same purpose, sending application/xhtml+xml seems the most efficient way to make sure all browsers can do the right thing.

Watch out for Opera, before version 7.5: the <script> tag doesn't work when the document is sent with one of the three XHTML headers.

For all these reasons, to use an "inferior" XHTML 1.0 with a text/html header to resolve these puzzles seems sometimes simpler.
The XML declaration

Though optional, it must be added if the character set differs from UTF-8 or UTF-16 when this character set is not already defined by a higher level protocol. Some XML parsers need it too. As an example: <?xml version="1.0" encoding="koi-8r"?>.

Watch out for Internet Explorer: when this declaration is found in XHTML documents, Internet Explorer switches to "compatible" mode, and does no offer the XHTML enhancements. Unfortunately, the only way out of this requires browser dectection.
Declare the document type (DTD)

Because a XHTML document represents a particular case of XML, one must declare at the beginning of the document which tags will be used - and how. The <!DOCTYPE> declaration serves this purpose.

Presently four XHTML DTDs exist:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

This DTD describes the XHTML version the most similar to HTML. Some tags are still only used for presentation because it is based on a syntaxic rewrite of HTML, version 4.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

The "strict" version of XHTML 1.0; farewell <applet>, <basefont>, <center>, <dir>, <font>, <iframe>, <isindex>, <menu>, <noframes>, <s>, <strike>, <u> tags. Some attributes - most of them related to presentation - have also been removed.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">

This declaration must be used for framed pages.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

Very near to the strict XHTML 1.0 DTD, this one completes the name attribute conversion into id; these two attributes shared the same goal from the start. The <ruby> tags was also added, and the lang attribute becomes xml:lang.

More XHTML DTDs exist for cases when other XML dialects are added to regular XHTML, like MathML and SVG. Without telling hairy details, use of XML as a base structure for XHTML allows modular properties - that makes the language both extensible and ready for tomorrow.
Declare the namespace of the <html> root element

All XML documents contain a root element, wrapping all other elements (except the declarations above). For XHTML, the root element is called <html>.

This element must tell what namespace it comes from. This relation is established with the xmlns attribute:

<html xmlns="http://www.w3.org/1999/xhtml">

A good habit: include at the same time the language used in the document, with the xml:lang attribute:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
All tags must be written in lowercase

In XHTML, <BODY> does not exist. This element is called <body>.
All tags must be closed

For each opened tag, a closing tag. If a tag surrounds nothing (like <img>, <hr> and <br> do), a forward slash must be added. Instead of <br>, use <br/>.

To ease compatibility, a white space can be added before the slash, like this: <br />.
No tag superposition is allowed

Incoherence like <p><em>foo</p></em> should be written as <p><em>foo</em></p>.
Attribute values are written inside quotes

Each attribute must hold a value and this value should be wrapped between quotes. So <input type=checkbox name=foo checked> becomes <input type="checkbox" name="foo" checked="checked" />.

For attributes that didn't take a value in HTML (checked, disabled, compact, etc), the attribute name becomes the value, like disabled="disabled".
All & characters are now escaping characters

Because in XML the & character is always used as a prefix for entities (alias for one or many characters, like < for "<"), one must use & instead of & to show the "&" character. This applies everywhere in the XML document (even in attribute values) except inside CDATA sections - these sections are mainly used in <script> and <style> tags.

Some browsers, when in XHTML mode, can't swallow common HTML entities (like  , é and so on). To prevent messes, one must systematically use numeric notation (for the preceding example,   and é).
Watch out for <script> and <style> elements

Not so long ago, to separate style and script tags contents, comment tags () were used. This method does not always give expected results when using XML; systematically using CDATA sections always yields the wanted behavior.

Risky Sure
<style type="text/css">  </style> <style type="text/css"> <![CDATA[ body { color: #facade; } ]]> </style>
This new way generally breaks compatibility with older browsers. Three solutions are available:
1. Link to external files for all <script> and <style> sections;
2. Use server-size browser detection techniques;
3. Use this delimiter pair: "<![CDATA[//><!]]>".
The second method lacks elegance, likely to break with new browser versions. The third exploits parsing errors from browser engines. When possible, the first solution should give the best results over time.

Illegal	Legal
`Content-type: text/html Content-type: text/plain`	`Content-type: application/xml Content-type: application/xhtml+xml Content-type: text/xml`

Risky	Sure
`<style type="text/css"> <!-- body { color: #facade; } --> </style>`	`<style type="text/css"> <![CDATA[ body { color: #facade; } ]]> </style>`

Bones of a XHTML document

Here is shown a sample XHTML document, a simple wrap-up of the few constraints given above. Line numbers refer to the most likely explanation for that line.

[1] <?xml version="1.0"?>

[2] <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

[3] <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

[4] <head>

[5] <title>Page 1</title>

[6] <style type="text/css" media="screen">

[7] <![CDATA[

[8] p { color: black; background: white; }

[9] ]]>

[10] </style>

[11] </head>

[12] <body>

[13] <p>Hello!</p>

[14] </body>

[15] </html>

This document must be sent with proper headers!

Get around the past

Since the fall of Netscape around 1998, Internet Explorer dominates the browser market without trace of a doubt. Microsoft, owner of Internet Explorer, did not enhance its product since: no market share is to be gained anymore. In the shade, lots of other browsers carried on incorporating new techniques, becoming one after the other more compliant than Internet Explorer to the W3C recommendations. Some of these new methods are not supported with IE; in this section will be explained tricks an author can use to work around the obsolete IE while remaining at the bleeding edge of the new web medium possibilities. Other tips for the other browsers, too, as none is 100% compliant (not yet!)

Images

In their pocket, nineties authors could use two file formats for images: Joint Experts Picture Group (JPEG) and Graphics Interchange Format (GIF). JPEG offers a lossy compression format, allowing up to 16777216 (2²⁴) colors; perfect for photographs. GIF, though, only support 256 (2⁸) colors, but the image suffers no loss and even a 1-level transparency is built-in, so that images are not forced to take the rectangular shape typical of the JPEG format.

How then can be shown a non-lossy image with more than 256 colors? Can more than one level of alpha be available to allow transparency effects? a new fileformat was born in the mid-nineties.

The PNG format

Becoming a W3C recommendation in 1996, the PNG format holds a big number of advantages over the GIF format. The number of colors is not limited to 256 and the encoding adapts to the real number of colors used; up to 16777216. Just like the GIF format, the PNG format allows no loss. A 256-level alpha channel, allowing up to 256 transparency levels, is also provided. The format also supports gamma values, allowing faithful restitution of colors on every computer and monitor. Other important point: the file size is generally reduced when comparing to GIF; so no good reason remains to use GIF because the PNG format extends all the GIF specifications.

PNG is partially supported by Internet Explorer; the alpha channel is not spontaneously applied. There exists a method for Internet Explorer versions 5.5 and higher to display images with an alpha channel, though this trick will work only on <img/> tags and on background[-image] CSS directives. The workaround uses the non-official CSS directive called filter, using a proprietary command: AlphaImageLoader. So, to display a PNG image, instead of using:

<img src="foo.png" />

Internet Explorer needs:

<img src="blank.gif" style="width:100px; height: 64px; filter: progid:DXImageTransform.Microsoft.AlphaImageLoader(src='foo.png', sizingMethod='scale');" />

The "blank.gif" file points to a 1 pixel by 1 pixel transparent GIF image. Size of the PNG image must also be specified, using a CSS directive so that the image is shown at the right size. For background images, instead of using:

<div style="background: url(foo.png);"></div>

One must extend the directive:

<div style="background: transparent; filter: progid:DXImageTransform.Microsoft.AlphaImageLoader(src='foo.png', sizingMethod='scale');"></div>

A number of scripts that dynamically modify the <img> tags can be found, wrapping PNG images into dogfood for Internet Explorer; unfortunately their reliability seems to fall short (!!!). To server-side detect the browser (and ship a different document) seems a lot more safe.

When using this trick, hypertext <a> links sometimes break; some combinations of tags bring Internet Explorer into a state of purposelessness.

Also note that the background-color and background-image combination will not work as expected with Internet Explorer; the recommendation insists that the background color should be seen though a semi-transparent image - which IE will not do, using a syntax that would prevent it anyway.

The XHTML tags.

Though the tag set of XHTML resembles a lot to the HTML set, some differences are known and rendering can be altered depending of the document type. This section explains the differences between the two groups.

<a>

The tag that literally created the web could contain a target attribute in the past, allowing a new window to be opened on a hyperlink click (using target="_blank"). The target attribute has been removed since XHTML 1.1 and though opening new windows breaks the navigation and frustrates the visitor (as he can open new windows himself if he wants), there exists a way, thought ECMAScript, to get the same effect.

To open a new window on a hyperlink click, the onclick attribute must be used - so that people using a mouse can get a new window and those who surf with a non-graphical browser (and robots) can still follow the link.

So, instead of <a href="http://example.com/" target="_blank">, <a href="http://example.com/" onclick="window.open('http://example.com/', '_blank', ''); return false;"> will work in XHTML 1.1. The return false; code prevents clicks from replacing the current page.

<body>

Mandatory tag, it fills the whole page space in HTML but in XHTML the <html> element fills that function instead. So - by default - a browser could add margins to the <body> element or padding to the <html> element.

To get the same page look when switching from HTML to XHTML, one must add this to the CSS:

html { margin: 0px; border: 0px; padding: 0px; }
body { margin: 0px; }

Internet Explorer displays the page scrolling bar according to the <body> element, meaning that maybe a page where <body> does not entirely fill the browser space will not yield the expected result.

<fieldset>

This tag allows grouping of <form> elements in a structurally coherent way.

The Opera browser will not let you remove the border around the <fieldset> element. To remove the border effect, one must use a border color the same as the parents' background color.

<map>

To allow a clickable image, in which different parts go to different pages (a road map for instance), the <map> and <area> tags existed as early as in HTML 3.2. The code looks like this:

<img src="image.png" alt="Canada" usemap="#mymap" width="50" height="100" />
<map id="mymap">
<area href="section1.html" alt="Route 20" shape="rect" coords="0,0,49,49" />
<area href="section2.html" alt="Route 35" shape="rect" coords="0,49,49,99" />
</map>

A browser which doesn't show images will display a list containing the alt attribute contents, so that accessibility remains possible.

With XHTML 1.1, the usemap attribute of the <img/> tag doesn't accept URIs anymore. One must only write the name of the destination anchor, without the "#" prefix. As expected, this new method breaks compatibility and is supported by only few browsers. To downgrade to XHTML 1.0 for pages containing usemap attributes seems for the moment the one true way.

<object>

Suppose a new graphical format is released (called xyz), with oh-so-great features, and that only one browser supports it. To use it on a web page set, a script detects the browser and sends a more common image format to browsers that don't understand the xyz format. A second browser, a few weeks later, can read this xyz format. The logical next step is changing all the web pages to detect this second browser. Absurd. The <object> tag works around this by proposing alternatives - even text; a list of "objects" can be chained and the browser, following the proposed order, will pick the first "object" that it supports and renders it. A neat idea.

Internet Explorer does not behave like this on an <object> tag. For this browser, each <object> tag needs that the user agrees to ActiveX scripts, even if the forementioned <object> content will not use them. Even more, blindly, Internet Explorer displays all the possible alternatives, wraps the <object> in scrolling bars and a border. How to work around it? Knowing that the user has not disallowed ActiveX, a script can be added to the webpage to remove scrolling bars and borders:


function objectfix () {

  var objects = document.getElementsByTagName ('object');

  for (i = 0; i < objects.length; i++) { 
 
    var o = objects[i];

    if (((o.type == "image/jpeg") ||

    (o.type == "image/gif") || 

    (o.type == "image/png"))) {

      o.body.style.border = 'none';

      o.body.style.margin = '0';

      o.body.style.padding = '0';

      o.body.style.overflow = 'hidden';

    };

  };

};



window.onload = objectfix();

To prevent IE from displaying all alternatives, one must (again) server-side detect the browser. Nothing hints at a bugfix for IE.

<script>

Allowing to spice up a web page with ECMAScript and Javascript/JScript, this tag must not use the obsolete methods like document.write to alter a web page. Because a XML is seen as an information tree, one must - using the DOM - add nodes to the document.

Most browsers don't accept the short-hand notation for the <script> tag. So, <script type="text/javascript" src="foo.js" /> will not work, but <script type="text/javascript" src="foo.js"></script> will. From a XML standpoint, both forms are equivalent, but it seems that lots of browser engine programmers still don't get the idea.

The Opera browser, when in XHTML mode, will not parse this tag until its 7,5 version; one must present another mode (typically text/html) to Opera if scripting is needed.

Style sheets (CSS)

Cascading Style Sheets (CSS) appeared at the end of the nineties and allowed a first separation between contents and presentation. With simple directives (purely declarative), presentation of a HTML document can be changed without ever touching that document.

The block model

Since the first version, CSS1, to every element can be associated a margin, a border, a padding, a width and a height.

Internet Explorer (in its "compatible" mode, the "strict" mode of IE6 finally resolving the issue) differs from the other browsers by its understanding of the width and height properties; for IE, the width and height contain both border and padding, though the recommendation specifies that these three entities don't overlap.

To make the block model compatible with all browsers, one must add a second block in the faulty block and add paddings and borders in that one instead. Ugly because one must change the XHTML code (which should never be used for presentation). Another workaround consists in writing faulty CSS code to exploit parser bugs in IE. Even uglier..

Fixed positioning

One of the greatest achievements of CSS2: allowing the position of elements to be specified precisely. Pixel for pixel, relatively positioned or the good old way, every type of layout has its favorite method. There even exists a method that allows positioning fixed elements, so that scrolling the page don't move them.

The position: fixed CSS2 directive is correctly computed by most browsers, though Internet Explorer does not. To get the same effet, one must use javascript code - that code checks if the page scrolled and replaces the element to its intended position. As an example case, this code makes sure the element with the "menu" id stays on top of the screen. Just calling the init() function on page load suffices:



var menu; var theTop = 0; var old = theTop;

function init() {

  menu = document.getElementById('menu');

  movemenu();

};

function movemenu() {

  if (window.innerHeight) {

    pos = window.pageYOffset

  }

  else if (document.documentElement && document.documentElement.scrollTop) {

    pos = document.documentElement.scrollTop

  }

  else if (document.body) {

    pos = document.body.scrollTop

  }

  if (pos < theTop) pos = theTop;

  else pos += 0;

  if (pos == old) {

    menu.style.top = pos;

  }

  old = pos;

  temp = setTimeout('movemenu()',100);

}

Another method, that could work in some cases, is described by Simon Jessey.

Conclusion

Microsoft has gained the largest browser market share by its bundling with the Windows operating system. Though this inclusion allows anyone who installs Windows to get a browser without any hassle, it somehow prevents competition with more adapted and clearly more advanced browsers. Great browsers like Mozilla (and its derivatives Netscape, Firebird Firefox, Galeon, Epiphany, K-Meleon, Camino, ...), Opera, Konqueror (and its cousin Safari from Apple) are all waiting for their trial, integrating one by one the new W3C recommendations. The web evolves and Internet Explorer refuses to follow suit.

The computer world progresses by giant steps and applications that refuse change don't usually last long; Internet Explorer has accumulated a considerable lag - don't let the WWW become stiff in front of all that great future.

A noble experiment by Dean Edwards to work around Internet Explorer deficiencies is called IE7. His compatibility module offers lots of solutions to the problems shown above. Let's hope his great work continues, allowing more and more authors to embrace the standards and clean-up this so messed up web.

All comments, corrections or suggestions should be sent using this form. It'll be a pleasure to improve this document for community's well being.

XHTML:: How to succeed

Creation : August 30th, 2003

Villeray

N 45° 33′ W 73° 36′

XHTML:: How to succeed

Last update : November 21st, 2006,

Villeray,

N 45° 33′ W 73° 36′

Copyright © 2003 Patrice Levesque. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.

XHTML :: How to succeed

Foreword

Why choose XHTML

Contents and presentation

XHTML gives good habits

Syntax of XHTML

Differences between HTML and XHTML

XHTML header: not text/html anymore

The XML declaration

Declare the document type (DTD)

Declare the namespace of the `<html>` root element

All tags must be written in lowercase

All tags must be closed

No tag superposition is allowed

Attribute values are written inside quotes

All `&` characters are now escaping characters

Watch out for `<script>` and `<style>` elements

Bones of a XHTML document

Get around the past

Images

The PNG format

The XHTML tags.

<a>

<body>

<fieldset>

<map>

<object>

<script>

Style sheets (CSS)

The block model

Fixed positioning

Conclusion

XHTML :: How to succeed

Foreword

Why choose XHTML

Contents and presentation

XHTML gives good habits

Syntax of XHTML

Differences between HTML and XHTML

XHTML header: not text/html anymore

The XML declaration

Declare the document type (DTD)

Declare the namespace of the <html> root element

All tags must be written in lowercase

All tags must be closed

No tag superposition is allowed

Attribute values are written inside quotes

All & characters are now escaping characters

Watch out for <script> and <style> elements

Bones of a XHTML document

Get around the past

Images

The PNG format

The XHTML tags.

<a>

<body>

<fieldset>

<map>

<object>

<script>

Style sheets (CSS)

The block model

Fixed positioning

Conclusion

Declare the namespace of the `<html>` root element

All `&` characters are now escaping characters

Watch out for `<script>` and `<style>` elements