Week 4 - Markup Languages 
 The Virtue of Text 
As mentioned in the first week's notes, text was chosen as the medium of
choice on the Internet because it is a "least common denominator" which any
computer can understand. (Binary formats suffer from endianness and wordsize
incompatibilities between architectures.)
 The Virtue of Markup Languages 
In order to apply typographic features to otherwise boring text, markup
languages have been invented which describe the text they will modify. Many of
these have been invented over time. The following is by no means a complete
list.
 - troff / nroff - This form of markup was originally invented at AT&T Bell
   Labs by the same guys who wrote the original UNIX (Kernighan, Ritchie, &
   Thompson). This format is still used for man pages.
- Post Script - This format was created by Adobe. It was originally a
   proprietary format, but it's proliferation led to greater and greater
   openness. Many printers support Post Script to format text when printing.
- TeX & LaTeX - A Mathematics and Computer Science professor by the name of
   Donald E. Knuth set to the task of writing a textbook for his students.
   Along the way, he learned that he would need some kind of markup language
   to format his textbook. Seven years later, the TeX typesetting language
   emerged. Many believe that it was well worth the wait, as it is a markup
   language which gives authors a high level of control in the formatting of
   mathematical equations, and can handle very large sized documents.
- SGML - The SGML Project is funded by the Information Systems Committee of
   the UFC. It is a a project to create a Mother-Of-All markup language which
   allows you to rigorously, precisely, author a document using data type
   definitions (DTDs). The idea here is that you can create a document which
   is completely independent of the tools used to author it. In addition to
   authoring documents, you can also define new markup languages with SGML.
   This format was submitted to all major word processor makers as the format
   of choice for document saving, and practically none of them went for it.
- HTML - As the World Wide Web was being developed, it was apparent that a
   markup language was needed. SGML, however, was just too fat and didn't have
   built-in support for things like hyperlinks. A stripped down version of
   SGML was made, and lo, HTML was born.
 Shortcomings of HTML 
HTML worked well as the ML for the World Wide Web: It's small, fast, supports
embedded documents like sounds and images, and supports hyperlinks to other
sites. It is not without its disadvantages, however.
	- It is not precise - This is by design, actually, and from a certain
	  standpoint cannot be considered a shortcoming. For people who want
	  control over their document's layout, this is a real bugaboo.
- It does not extend easily - This shortcoming was manifested most
	  prominantly durring the "browser wars" where the two prominent browser
	  vendors did their level best to break or harrass each other's browsers
	  by adding non-standard extensions, all in the name of "product
	  distinction".
- It does not allow for re-use of formatting (or styles) - Desktop
	  publishing software has supported style setting and reuse for years.
	  Long-time users of said software were aghast to find this feature
	  missing from the HTML spec.
- It does not have built-in support for dynamic content - Lots of people
	  that do programming and want to make spiffy, neat-o Websites complained
	  about this.
 The Answer: XML 
The W3C took these criticisms to heart and came up with a new standard for
document markup which would address these problems. XML provides authors with
the following features:
	- The ability to define and re-use styles - This is a feature that is
	  present in SGML but was excluded from HTML in the interest of efficiency
	  (remember, bandwidth was (and still is) scarce).
- A standard way to make non-standard extensions - This is also done
	  through the "style" mechanism mentioned previously.
- Support for scripting - Most notably, the ONMOUSE-OVER/LEAVE/OUT
	  attributes which allow you to handle mouse events.
- A graceful transition from HTML to XML - XML supports default styles for
	  the existing HTML tags that you have already learned. This is
	  particularly bright of them as an abrupt transition would have been
	  rather painful.
You can look at styles and how they are used in the styles section
of the HTML 4.0 documentation.
 Beyond ASCII 
 The Shortcomings of ASCII 
One last little problem with Internet communications is ASCII. As many of you
already know, the 'A' in ASCII stands for 'American'. Well, as it turns out,
people other than Americans use the Internet too, and a lot of them use
different characters than the Roman/Latin set. What's worse, the 8-bit
character format does not allow for more than 255 characters, and whoops! all
of those spots are taken.
 The Answer: Unicode 
A new, universal (well, global anyway) character set based on a 16-bit size
has been created. This allows for 64 million different characters, which is
enough to hold not only the existing Roman/Latin characters (still available
at their original 255 slots), but other character sets like Arabic and Hebrew,
and very large character sets like Kanji.
Changelog
2/3/98 - Initial revision