Week 4 - Markup Languages

The Virtue of Text

As mentioned in the first week's notes, text was chosen as the medium of choice on the Internet because it is a "least common denominator" which any computer can understand. (Binary formats suffer from endianness and wordsize incompatibilities between architectures.)

The Virtue of Markup Languages

In order to apply typographic features to otherwise boring text, markup languages have been invented which describe the text they will modify. Many of these have been invented over time. The following is by no means a complete list.

Shortcomings of HTML

HTML worked well as the ML for the World Wide Web: It's small, fast, supports embedded documents like sounds and images, and supports hyperlinks to other sites. It is not without its disadvantages, however.

The Answer: XML

The W3C took these criticisms to heart and came up with a new standard for document markup which would address these problems. XML provides authors with the following features:

You can look at styles and how they are used in the styles section of the HTML 4.0 documentation.

Beyond ASCII

The Shortcomings of ASCII

One last little problem with Internet communications is ASCII. As many of you already know, the 'A' in ASCII stands for 'American'. Well, as it turns out, people other than Americans use the Internet too, and a lot of them use different characters than the Roman/Latin set. What's worse, the 8-bit character format does not allow for more than 255 characters, and whoops! all of those spots are taken.

The Answer: Unicode

A new, universal (well, global anyway) character set based on a 16-bit size has been created. This allows for 64 million different characters, which is enough to hold not only the existing Roman/Latin characters (still available at their original 255 slots), but other character sets like Arabic and Hebrew, and very large character sets like Kanji.


2/3/98 - Initial revision