XMDP: Introduction and Format Description

Author(s)

Introduction

HTML4 has several mechanisms for describing meta data, the most well known being the <meta> element. <meta> is often used to denote the author of a document:


 <meta name="author" lang="tr" content="Tantek &Ccedil;elik" />

Briefly explained:

Each META element specifies a property/value pair. The name attribute identifies the property and the content attribute specifies the property's value.

A half dozen or so example properties which are fairly self explanatory: author, keywords, copyright, date, identifier. But in the definition of the 'name' attribute of the <meta> element, the HTML4 specification takes great care to state that This specification does not list legal values for this attribute.

Instead, HTML4 provides a mechanism (the 'profile' attribute of the <head> element) to point to a meta data profile that defines properties and values. And again, the specification states explicitly that it does not define formats for profiles.

This proposal seeks to define a meta data profile format using principles of simplicity, reuse, and minimalism. The constraints, direction and hints for the format are derived from the HTML4 specification, and the building blocks of the format are taken from XHTML 1.0. Since the format is a subset of XHTML and therefore a profile itself, the format is called the XHTML Meta Data Profile, abbreviated as XMDP.

Principles

Hints from HTML4

Strong emphasis added for clarity.

Deductions

Format Description

The XHTML profile format consists of a definition list of properties as definition terms, and as their definitions, an optional brief description, and then, if applicable, one or more definition list(s) of values.

First the profile definition list, recognizable by its class:


<dl class="profile">

Note that the HTML4 'class' attribute is a space separated set of values. All that is required is for the value "profile" to be in that set.

Next a definition term and definition for a property:


 <dt id='property1'>property1</dt> 
 <dd>

The property name is given an 'id' attribute so pages can reference the property in particular with a URL with the appropriate fragment identifier. The 'id' attribute need not be the same as the name of the property, but probably should be for the sake of simplicity.

Any amount of valid optional markup (except for definition lists of course) may be used to provide a prose description and/or references for the property.


  <p>Authors may use property1 to describe 
   some particular details.
  </p>

One or more nested definition list(s) for the values and their definitions. If the values do not form a discrete set, or if that set should be too large to practically enumerate, a simple prose description of the set of legal values and any type constraints will suffice.


  <dl>
   <dt id='value1'>value1</dt>
    <dd>definition of value1</dd>
   <dt id='value2'>value2</dt>
    <dd>definition of value2</dd>
   ...
  </dl>
  ...
 </dd>

Again 'id' attributes are used so pages can reference a specific value using a URL to the profile with the fragment identifier for the value. And again the 'id' attribute need not be the same as the name of the value.

Perhaps another property:


 <dt id='property2'>property2</dt>

And its values description instead:


 <dd>
  Property2 contains a space separated set of values, 
  each of which is a date in the ISO8601 date format.
 </dd>

Etc., and finally closure of the outer definition list:


 ...
</dl>

Format Embedding and Profile Documents

The format may be embedded anywhere an HTML4 definition list may be embedded. Being well formed XML, the profile format may also be embedded in any XML document that permits embedding of XHTML.

A self-standing profile document can be simply constructed by wrapping the profile format with the minimal XHTML necessary for a valid XHTML document.

XMDP profile documents are typically HTML or XHTML documents (or both), and should be sent with the respective MIME type, i.e. 'text/html' for HTML or Compatible XHTML 1.0, or 'application/xhtml+xml' for XHTML.

Sample Profile Document

The various meta properties used informatively in HTML4 could be defined by the following profile document (also available online: samplehtmlprofile.html):


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>sample HTML profile</title></head>
<body>
 <dl class="profile">
  <dt id='author'>author</dt>
   <dd>A person who wrote (at least part of) the document.</dd>
  <dt id='keywords'>keywords</dt>
   <dd>A comma and/or space separated list of the 
    keywords or keyphrases of the document.</dd>
  <dt id='copyright'>copyright</dt>
   <dd>The name (or names) of the copyright holder(s) 
    for this document, and/or a complete statement of copyright.</dd>
  <dt id='date'>date</dt>
   <dd>The last updated date of the document, in ISO8601 date format.</dd>
  <dt id='identifier'>identifier</dt>
   <dd>The normative URI for the document.</dd>
  <dt id='rel'>rel</dt>
   <dd>
    <dl>
     <dt id='script'>script</dt>
     <dd>A reference to a client-side script. When used with the 
      LINK element, the script is evaluated as the document loads and 
      may modify the contents of the document dynamically.</dd> 
    </dl>
   </dd>
  </dl>
</body>
</html>

Using a Profile

For document authors, HTML4.01 describes the 'profile' attribute for referring to profiles. In short, to refer to a profile from any (X)HTML document, simply add a 'profile' attribute to the document's head element,

<head>

e.g. to reference the above samplehtmlprofile.html profile:

<head profile='http://gmpg.org/xmdp/samplehtmlprofile.html'>

Similarly, tools can load and cache one or more XMDP profiles by reading the profile attribute, treating it as a space separated set of URIs (the above example demonstrates the simple case of one profile, while the profile attribute may reference several), retrieving the profiles addressed by those URIs, and constructing a dictionary of properties and values by parsing the definition lists <dl>, terms <dt>, and definitions <dd> as specified in the XMDP Format Description above.

Using Multiple Profiles

Space separated

HTML4.01 states that one or more meta data profiles, [are] separated by white space. The term "white space" is used interchangably for "white space characters" in HTML4.01. HTML4.01 defines white space in terms of a set of characters. Note that HTML4.01 contains a few other space separated attributes, such as the 'class' attribute, and the 'rel' attribute, and thus, the treatment of space-separated values is fairly well understood. To keep it simple, authors should use a single space character to delimit more than one profile URI in the 'profile' attribute, e.g.:

<head profile='http://example.org/p1 http://example.org/p2'>

Tools, however, should expect any amount/sequence of white space between URIs, where such white space consists of one or more occurances of the white space characters as defined in HTML4.01.

Relative significance of profiles

HTML 4.01 says that this specification only considers the first URI to be significant. Obviously to reference and use multiple profiles, this portion of the spec must be extended just slightly to allow all URIs in the 'profile' attribute to have some meaning.

However, clearly HTML4.01 shows a bias towards the first URI rather than later URIs. Thus, consistent with that bias, the URIs in the 'profile' attribute are to be treated most significant (first) to least significant (last). Such relative significance only makes a difference when profiles attempt to define the same property(s) and/or value(s). Thus if two or more profiles define the same term, the earliest (first) of those profiles wins, and its definition for that term is used.

References

HTML4
Raggett, D.; Le Hors, A.; Jacobs, I.. HTML 4.01 Specification. Dec 1999. W3C Recommendation. URL: http://www.w3.org/TR/1999/REC-html401-19991224
ISO8601
"Data elements and interchange formats -- Information interchange -- Representation of dates and times", ISO 8601:1988.
URI
"Uniform Resource Identifiers (URI): Generic Syntax", T. Berners-Lee, R. Fielding, L. Masinter, August 1998.