• Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL

Chapter 5. XML Structure > Entity References

Entity References

It is frequently convenient and often necessary to perform string substitution during the parsing of an XML file. The XML entity concept allows such substitution. We will encounter other entity types later, but the only references found in an XML file proper are general entity references. These are references that replace short symbols with legitimate XML character strings. They serve many purposes.

The simplest (and perhaps most common) case is the need to insert into a text stream characters that are meant literally as character data but that the parser would consider as markup. The two characters susceptible to such misinterpretation are the < character, which introduces a tag, and the & character, which introduces, in fact, an entity reference. These two characters can appear in an XML document only when they are serving their specific functions. XML defines entities that substitute for these two. In addition, it provides for a substitute for the > character to satisfy an obscure SGML requirement and for both single and double quote marks, for reasons of convenience that should be obvious.

Additionally a similar escape sequence can exist for any character. The character code (e.g., ASCII or Unicode value) can be expressed in decimal or hexadecimal form. This lets us embed characters that are difficult to produce on the keyboard or unreliable to display.

If the DTD (a preprocessor file we will encounter later) defines it, we can use our own notation, like &yen; to represent the ¥ symbol more clearly than &#165; does.

Of course we are not limited to single characters. The entity can be invoke a string of arbitrary length. An entity is a useful way to represent frequently used and lengthy text. It is even more valuable for representing volatile data. For example, an entity called &webmaster; might list the name and contact information for the person responsible for supporting an XML document. Data encoders enjoy the ability to represent this lengthy, frequently repeated data with a short clear symbol. And if the webmaster position has high turnover, everyone will appreciate the entity's ability to manage volatility.

Flash Context

Few of the abilities of the entity reference are actually realized in the version of Flash available at the time of this writing. The ActionScript parser is declared to be nonvalidating. A nonvalidating parser does not check the XML file against the declarations in the reference DTD file.

The complex capabilities of entity processing—the surface of which we have merely scratched—depend on use of DTD. So it is not surprising to find that user-defined entities such as &yen; or &webmaster; do not function as of the time of this writing.

The five predefined escape sequence entities work as expected, and so does the decimal encoding of single characters. The hexadecimal equivalent is not functional at the time of writing but might be fixed soon, as it seems to be the product of oversight, not architecture.


The entity reference generally presumes a matching entity declaration in a referenced document: the DTD.

Alternatively there are five symbolic predefined entities and a mechanism for specifying them with decimal or hexadecimal numbers. All these invoke only a single character:


&#ddd;        decimal character code ddd
&#xhh;        hex character code hh
&amp;         &
&lt;          <
&gt;          >
&quot;        "
&apo;         '


Each entity reference must have

  • an opening ampersand (&)

  • a valid entity reference

  • a closing semicolon (;)

It may have (in the entity reference position)

  • the # followed by a decimal number representing a character code

  • the pair #x followed by a hexadecimal character code

  • one of five standard character entities: lt, gt, apos, amp, quot

  • any token defined in the DTD

It may not have

  • any characters or sequence not valid in an XML name.

Examples of General Entity References

if (x&gt;min &amp;&amp; x&lt; <max)
    print(&quot;it&apos;s ok&quot;);

if( x >min && x <max) print(“it's ok”);
&#xA9;2000 Jacobson &amp; Jacobson ©2000 Jacobson & Jacobson
&copyrt;2000 Jacobson &amp; Jacobson Same as above if DTD defines copyrt as #xA9

Bad Examples

&copyrt;2000 Jacobson &amp; Jacobson If no definition of copyrt exists, this resolves to &copyrt;2000 Jacobson & Jacobson
<ELEMENT attribute=&quot;value&quote;> Entity references cannot be terminators.

  • Creative Edge
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint