• Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL

Chapter 37. Toward XML > Starting with XML

Starting with XML

In a sense you already know how to code in XML if you have become used to writing clean, well-made HTML 4 code. You might need only to eliminate some bad habits to become a competent XML coder, so here you will concentrate on the differences between XML and HTML. This focus highlights the skill sets required for XML and makes clear the many similarities between XML and HTML:

  • XML is case sensitive because capital letters are not a universal concept— If you were to accommodate capital letters as equivalents, you would have to do the same for thousands of other letter variations in other languages, an onerous task. Some languages don't even have cases. There's no such thing as lowercase Hebrew, for instance, and Arabic distinguishes between initial, medial, and final forms of letters. For those who like to put their tags in uppercase and attributes in lowercase to make them stand out, this is terrible news. But modern coding editors make this less of an issue than it might have been previously. It's common to define special colors to mark tags, for example, so using uppercase is somewhat of an historical anachronism, like line numbers in COBOL.

  • XML is very sensitive to the proper nesting of tags— Tags cannot end in a different context from which they started. So if you want <bold><italics>, you have to close your emphasized phrase with </italics></bold> to avoid a fatal error. Because XML can reference and include XML documents and document fragments anywhere on the Web that you have no control over, every XML document has to obey the same rules so you don't break one another's documents.

  • XML is not well protected against recursion— Although it's possible to set up explicit exclusions at a given level, with a complex document structure it's difficult to maintain those exclusions at lower levels, especially when using tags that might apply at any level. So, the HTML prohibition of including an anchor <a> tag within another anchor tag is there in XHTML, but not enforced beyond direct inclusion.

  • XML requires you to close every tag, even empty tags— Because it's possible to create an XML document that doesn't use a DTD, an XML processor has no way of knowing whether a tag is empty. Because all XML documents have to be well-formed, you have to mark empty tags with a special syntax that tells an XML processor the tag is empty and closed. You do that by placing a space and a slash mark at the end of the tag like this:

    <break />

    There's an alternate syntax that works just as well for real XML processors but often breaks HTML Web browsers when used with XHTML, which is to close an empty tag such as <br> with </br> like this:


    Unfortunately it's too dangerous to use safely. Many current and most legacy browsers don't recognize the non-HTML closing tag and do odd things with it. Navigator 4.7, for example, might trash the display when it stumbles across a closing break tag. The exact behavior might vary by position in the code and the exact empty tag being closed. In short, it's error prone and should be avoided.

  • XML requires the use of either single or double quotes around attribute values— Where HTML is lax about numbers especially and almost anything without included spaces, XML treats everything as character strings and lets the application figure everything out.

  • XML supports multiple languages— It doesn't really support the extended character sets used in many European languages by default, as does HTML. There's an easy mechanism for including these, as well as the entire Unicode (also known as ISO/IEC 10646) character set of more than a million characters, so support for Chinese, Arabic, and many of the more exotic languages of the world is a piece of cake.



Not a subscriber?

Start A Free Trial

  • Creative Edge
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint