• Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint

Character References

Unlike SGML (and as a result, HTML too), which is very much ASCII based, XML was developed right from the start with a view to supporting languages other than English. In HTML, you can enter the code for certain non-English characters. For example, è would be è, í would be í and û would be û. As you will see at the beginning of the next chapter, these codes are, in fact, entity references. The abbreviations egrave, iacute, and ucirc are taken from the ISO 8859/1 character set (SGML's character set), which is derived from the ISO/IEC 646 version of the ASCII alphabet (the first 128 characters). ISO 8859/1 is also the basis for the Microsoft Windows fonts. Although these character entity references will enable you to deal with most European and Scandinavian languages, they are completely insufficient for displaying many Asian or Middle Eastern languages, such as Japanese, Hindi, or Arabic.

XML solves this problem by being based on Unicode and on the even more extensive ISO/IEC 10646 standards (the latter even allows the use of Chinese characters). If you need them, XML enables you to use these exotic characters—even if your keyboard doesn't support them. You do this by entering a character reference.


PREVIEW

                                                                          

Not a subscriber?

Start A Free Trial


  
  • Creative Edge
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint