• Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint

Character Sets

In the ASCII alphabet, only 128 different 7-bit patterns are possible, so 7-bit ASCII is able to represent only 128 characters. These 128 characters are known as the standard ASCII character set and have been the basis of computing for many years. As computers became more advanced, and as they become more of an international phenomenon, extra characters were needed to cover things such as the accented characters used so much in European countries. The eighth bit was therefore repurposed to give 8-bit character sets, thereby doubling the number of possible characters to 256, and is standardized as the ISO 8859 character set. In fact, many ISO 8859 variants exist, each tailored for a specific language; the version we probably meet most often is 8859/1, which is the character set used for HTML and understood by Web browsers. This character set includes accented characters, drawing shapes, a selection of the most common Greek letters used in science and technology, and various other symbols. The first 128 characters of ISO 8859/1 are the same as ISO 646; therefore, it is backward compatible.

Eight bits are fine for most Western languages but are nearly useless for Asian and Oriental languages. To allow for languages such as Arabic, Chinese, Urdu, and so on, first Unicode (with 16-bit encoding) and then ISO 10646 took the next logical steps to support the use of up to 32-bit patterns to represent characters. These allow more than 2 billion characters to be represented. ISO 10646 provides a standard definition for all the characters found in many European and Asian languages. Unicode is used in Microsoft Windows NT.


PREVIEW

                                                                          

Not a subscriber?

Start A Free Trial


  
  • Creative Edge
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint