Project documentation

Table of Contents

List of Tables

List of Examples

  • PHP4May currently work now with PHP5, one day it will indeed

  • mailparsePECL extension to decode mail

  • mbstringPHP extension to handle multibyte characters

  • iconvPHP extension to convert charsets (though if the list to archive is using english characters only this can be optional)

  • Mail_MimePEAR extension (Mail/mimeDecode.php is needed from this package, PHP4 doesn't ship iconv_mime_decode yet)

The mailing list archives must be constituted of separate files containing 1 mail message each with full headers.

What, where, why, when, how?

The author uses Sympa 5 as a mailing list manager. Though its pure mail engine works fine, its web archives (generated with mhonarc 2.6.5) seem deficient. Other engines were studied (LISTSERV 14.4, Hypermail 2.1.8, dbmarc 20030519) but it feeled like the world needed a new mailing list archiver. Why? they all exhibit many annoyances from these:

  1. Cumbersome navigation. "One message, one page". Ever searched google and found the middle of the thread in a mailing list archive? Feels like the middle of nowhere!

    xhtmail puts all messages of a thread in a single page.

  2. Weak table-based markup. Tables are meant for tabular data; every web coder should know. Though in 1995 no other method existed to obtain a complex layout, this era is long gone.

    xhtmail generates semantic markup, with no tables; signatures, titles, dates, quotes in mail messages are recognized most of the times and wrapped as such in appropriate tags.

  3. No markup language metadata. Most browsers and search engines crave the metadata stored in a web page.

    xhtmail recuperates most metadata from the mail message headers; for indexes and threads, navigational metadata is generated.

  4. Plain look. Everybody knows how the web looked like in 1994. Monochrome monitors lost popularity. Most of the engines don't allow any control of the presentation.

    xhtmail exclusively uses CSS for presentation so the web archive can be easily styled; the engine adds icons and even X-Faces to the message, bringing life to otherwise plain messages.

  5. Whack-A-Mole URIs. Archives can change at any time, messages can be deleted for different reasons. As most engines use an internal pointer to generate mail message URIs, should a page in the original archives be deleted, other URI will change.

    xhtmail always generates the same URI for the same thread and the same anchor for the same message. Cool URI Don't Change™.

  6. Undescriptive URIs. Each web resource should be described when possible; people (and search engines) like it though no RFC nor web standard ask for descriptive URI

    xhtmail uses a combination of timestamp and message thread subject to build the thread URIs, giving them expressiveness; as a bonus, webstats suddenly become more readable...

  7. Invalid markup. Though W3C recommendations for HTML exist since 1995, only a fraction of the web markup shows validity and mailing list archivers too create tag soup.

    xhtmail complies with XHTML Strict recommendations and can be safely sent as application/xhtml+xml.

  8. Useless file extensions for messages. As said before, Cool URI Don't Change™; if the mail archiver generates .html files, and suddenly the pages need to be wrapped in a scripting language, troubles arise.

    xhtmail puts all threads in separate directories so no visitor-visible extension is necessary.

  9. No/flaky support for feeds. Many people like to stay informed of what's happening on a list but don't want to subscribe for different reasons.

    xhtmail exports RSS2.0 and Atom1.0 feeds with message summaries.

  10. Difficult integration with existing site templates. It should be easy to get a uniform look through all pages of a website, even mail archives.

    xhtmail is template-driven so existing bits of markup can be reused. Though the xhtmail code is written in PHP, any CGI language (python, perl, ruby, php, ...) can be used for templating.

Before anything, this script is really tied to the author's needs. Some limitations may seem stupid; contact the author, those probably arise from simple overlooks.

As a complement of the above remark: xhtmail was first built for a french-speaking mailing list; it will indeed work better with this than Thai or Arabic.

Here are notes for the advanced usage; this chapter may be skipped by those who show no interest whatsoever about free software development.

Using phpdocumentor, full phpdoc can be obtained for this code, by using a command:

phpdoc -dn xhtmail -f xhtmail.php -o HTML:frames:earthli -t api -ti "xhtmail documentation" -s on

That doc is also available on http://ptaff.ca/xhtmail/api/.

They're welcome. Simple patches can help the project get better in no time.

Any patch to xhtmail should be build with diff -ru distributed_xhtmail_directory patched_xhtmail_directory and mailed to for revision. Only patches to a recent CVS version or the latest official release will be accepted.


php xhtmail.php options file [file...]

Each file is a separate mail message, with full headers.



What content (typically XHTML) should be put after the xhtmail output?


What content (typically XHTML) should be put before the xhtmail output?

Special tags in the file will be replaced according to rules in Appendix B

No <hN> tags should be found in this file because they are already used by xhtmail for message titles.

-c FILENAME, --contents-after=FILENAME

What content (typically XHTML) should be put after the thread contents in thread pages?


What extension should we give to output files? Default is html. Web visitors won't see this extension, this is intended for server-side processing (if your templates are written in ruby, use rb, for python use py and so on).

-f NUMBER, --feed=NUMBER

Number of Atom/RSS entries in the main archive output. Defaults to 0 (no feed)

See -u option

-h, --help

Displays this help text and exits

-i FILENAME, --index_after=FILENAME

What content (typically XHTML) should be put after the index in index pages?


What interface language should the generated pages use? Default is en

Available: en, fr

-n STRING, --name_of_list=STRING

Name of the mailing list for titling purposes

-o DIRECTORY, --output_directory=DIRECTORY

Where should the output files and directories be located?

-p URI[,URI], --picture_uri=URI[,URI]

Image URI and icon URI respectively for the Atom/RSS feeds. Silently discarded if not used with -f; the icon should be the size of a favicon (16x16), if provided

-t STRING, --title_of_list=STRING

Long name of the archive (Like "Archives of the foo mailing list")

-u URI,URI, --uri_base=URI,URI

Base URI for webpages and RSS respectively, separated by a comma, like "-u http://ex.com/path1,http://ex.com/path2"; no slash is wanted at the end of the URIs. This parameter is silently discarded if not using the -f flag (feeds). These base URIs are used only for cross-linking between web pages and feeds, not for internal links. The feeds base should be before the feeds subdirectory (if your feeds are at http://example.com/feeds/example.xml, use http://example.com). If unspecified, the feeds will be located in the feeds/ directory of the output directory (-o)

-v, --version

Output version information and exit

Danger Will Robinson

Option parsing is primitive at best. Should results be weird, command line inspection is strongly suggested.

Tag Meaning Example substitution
<^ATOM_PATH^> URI of the Atom feed http://example.com/atom.xml
<^CONTRIBUTOR_NAMES^> Comma-delimited list of people names who posted to the thread Barney Rubble, Bamm Bamm, Betty McBricker
<^CREATOR_NAME^> Name of the creator of a thread (who started the thread) Fred Flintstone
<^DATE_OF_CREATION^> Date of creation of the page (according to the mail messages timestamps) 2004-09-22
<^DATE_OF_UPDATE^> Date of update of the page (according to the mail messages timestamps) 2004-09-23
<^FIRST_TITLE^> Title of the first page in a set (title of the first month page on month pages and title of the first thread on thread pages) mylist - 2008/01
<^FIRST_URI^> URI of the first page in a set (first month URI in month index view and first thread URI in thread pages) http://example.com/mylist/2005-09/
<^INDEX_TITLE^> Title of the index page Index
<^INDEX_URI^> URI of the mailing list archive main index http://example.com/mylist/
<^LAST_TITLE^> Title of the last page in a set (title of the last month page on month pages and title of the last thread on thread pages) mylist - 2008/12
<^LAST_URI^> URI of the last page in a set (last month URI in month index view and last thread URI in thread pages) http://example.com/mylist/2005-09/
<^NEXT_TITLE^> Title of the next page in a set (title of the next month page on month pages and title of the next thread on thread pages) mylist - 2008/07
<^NEXT_URI^> URI of the next page in a set (next month URI in month index view and next thread URI in thread pages) http://example.com/mylist/2005-09/
<^PARENT_TITLE^> Title of the parent page Parent
<^PARENT_URI^> URI of the parent page http://example.com/mylist/2008-09/
<^PREV_TITLE^> Title of the previous page in a set (title of the previous month page on month pages and title of the previous thread on thread pages) mylist - 2008/05
<^PREV_URI^> URI of the previous page in a set (previous month URI in month index view and previous thread URI in thread pages) http://example.com/mylist/2005-09/
<^RSS_PATH^> URI of the RSS feed http://example.com/index.rss
<^TITLE^> Title of the mailing list, given by the -n option My list

Table 2.1. Template tags