Contents
This section is informative.
Due to the fact that XHTML is an XML application, certain practices that were perfectly legal in SGML-based HTML 4 [HTML4] must be changed.
Well-formedness is a new concept introduced by [XML]. Essentially this means that all elements must either have closing tags or be written in a special form (as described below), and that all the elements must nest properly.
Although overlapping is illegal in SGML, it is widely tolerated in existing browsers.
CORRECT: nested elements.
<p>here is an emphasized <em>paragraph</em>.</p>
INCORRECT: overlapping elements
<p>here is an emphasized <em>paragraph.</p></em>
XHTML documents must use lower case for all HTML element and attribute names. This difference is necessary because XML is case-sensitive e.g. <li> and <LI> are different tags.
In SGML-based HTML 4 certain elements were permitted to omit the end tag; with the elements that followed implying closure. XML does not allow end tags to be omitted. All elements other than those
declared in the DTD as EMPTY
must have an end tag. Elements that are declared in the DTD as EMPTY
can have an end tag or can use empty element shorthand (see Empty Elements).
CORRECT: terminated elements
<p>here is a paragraph.</p><p>here is another paragraph.</p>
INCORRECT: unterminated elements
<p>here is a paragraph.<p>here is another paragraph.
All attribute values must be quoted, even those which appear to be numeric.
CORRECT: quoted attribute values
<td rowspan="3">
INCORRECT: unquoted attribute values
<td rowspan=3>
XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as compact
and checked
cannot occur in elements without
their value being specified.
CORRECT: unminimized attributes
<dl compact="compact">
INCORRECT: minimized attributes
<dl compact>
Empty elements must either have an end tag or the start tag must end with />
. For instance, <br/>
or <hr></hr>
. See HTML Compatibility Guidelines for information on ways to ensure this is backward compatible with HTML 4 user agents.
CORRECT: terminated empty elements
<br/><hr/>
INCORRECT: unterminated empty elements
<br><hr>
When user agents process attributes, they do so according to Section 3.3.3 of [XML]:
In XHTML, the script and style elements are declared as having #PCDATA
content. As a result, <
and &
will be treated as the start of markup, and
entities such as <
and &
will be recognized as entity references by the XML processor to <
and &
respectively. Wrapping the
content of the script or style element within a CDATA
marked section avoids the expansion of these entities.
<script type="text/javascript"> <![CDATA[ ... unescaped script content ... ]]> </script>
CDATA
sections are recognized by the XML processor and appear as nodes in the Document Object Model, see Section 1.3 of the DOM Level 1 Recommendation [DOM].
An alternative is to use external script and style documents.
SGML gives the writer of a DTD the ability to exclude specific elements from being contained within an element. Such prohibitions (called "exclusions") are not possible in XML.
For example, the HTML 4 Strict DTD forbids the nesting of an 'a
' element within another 'a
' element to any descendant depth. It is not possible to spell out such
prohibitions in XML. Even though these prohibitions cannot be defined in the DTD, certain elements should not be nested. A summary of such elements and the elements that should not be nested in them
is found in the normative Element Prohibitions.
HTML 4 defined the name
attribute for the elements a
, applet
, form
, frame
, iframe
, img
, and
map
. HTML 4 also introduced the id
attribute. Both of these attributes are designed to be used as fragment identifiers.
In XML, fragment identifiers are of type ID
, and there can only be a single attribute of type ID
per element. Therefore, in XHTML 1.0 the id
attribute is
defined to be of type ID
. In order to ensure that XHTML 1.0 documents are well-structured XML documents, XHTML 1.0 documents MUST use the id
attribute when defining fragment
identifiers on the elements listed above. See the HTML Compatibility Guidelines for information on ensuring such anchors are backward compatible when serving
XHTML documents as media type text/html
.
Note that in XHTML 1.0, the name
attribute of these elements is formally deprecated, and will be removed in a subsequent version of XHTML.
HTML 4 and XHTML both have some attributes that have pre-defined and limited sets of values (e.g. the type
attribute of the input
element). In SGML and XML, these are
called enumerated attributes. Under HTML 4, the interpretation of these values was case-insensitive, so a value of TEXT
was equivalent to a value of text
.
Under XML, the interpretation of these values is case-sensitive, and in XHTML 1 all of these values are defined in lower-case.
SGML and XML both permit references to characters by using hexadecimal values. In SGML these references could be made using either &#Xnn; or &#xnn;. In XML documents, you must use the lower-case version (i.e. &#xnn;)