razorborg essays

Ampersands

April 30, 2003 by Jan Martin Borgersen

Okay, because this keeps coming up,

This is where it states that URIs should use the entity value for an ampersand instead of a simple ampersand.

This means that when XSL outputs URLs like:

http://filename.ext?n1=v1&n2=v2

it is not a bug that there’s a & in there!. In fact, this URL is more spec-compliant than a URL that simply contains "&".

The breakdown goes something like this:

  1. HTTP and URI’s declare that the ampersand is used to separate name/value pairs for GET-type form submissions.
  2. SGML and XML specify that “special characters” need to be escaped into their character entities when they occur inside attributes in elements. Unfortunately, the ampersand is the character that denotes the beginning of a character entity, and is thus considered a "special character."
  3. This means that the ampersand must be represented as & or & if it occurs inside an href attribute in the A tag for the HTML to be truly valid. This is a hard requirement for XHTML validation.

An interesting corollary happens if you live in the XSL world, like I do. If you create your links in XSL like this:

<a href="{$my_link_var_with_ampersands}">...</a>

then ampersands will always be escaped into their entity form. However, if you create your links like this:

  <a>
<xsl:attribute name="href">
<xsl:value-of select="$my_link_var_with_ampersands"
disable-output-escaping="true" />

</xsl:attribute>
</a>

an interesting thing occurs. If your output method is set to "text/html", then the XSLT engine will output escaped ampersands. If you output method is set to "text/plain", then it will not, since the engine is not worried about producing code to SGML/XML specs.