Special characters and entity references: available in HTML/XHTML 1.0

By Ramón Alexander Burgos

When you search topics about HTML/XHTML special character or entity references on the Internet, only you get tables and coding values. But, are they important? Why? Do you get some value with its use? Exhibiting these issues is the goal of this article.

Some characters have a special meaning in HTML/XHTML. For example, four character entity references are frequently used to escape special characters: < (beginning of the tag), >, & (beginning of the character reference) and “ (may be used to delimit attributes).

The HTML 4 DTD includes around 252 character  references. Character references are an encoding-independent mechanism for entering any character from the document character set. On the web a common definition is: a special character is a non-numeric,  non-alphabetic member (a-z), common examples include $ & * ( + - > < " ÷ and {.

Note that the HTML 4.0 Specification and XHTML 1.0 standards are the same: the syntax and principles written in this article is equally applicable to both.

The formal name for such a special character is a character entity, and it can be written in two ways in HTML/XHTML. The easier of the two is called the symbolic reference, which is an easier to remember and more intuitive way of referring to characters in the document character set. All symbolic references start with an ampersand and end with a semi-colon, for example &amp;. Using the names means you don’t have to remember the magic numbers that describe where the characters appear in the document’s character set.

The disadvantage is that not all browsers support the entity names or references, while the support for entity numbers or decimal references is very good in almost all browsers

The second way is called the numeric character reference, also called decimal or hexadecimal character references. Numeric references also start with an ampersand and finish with a semicolon, but between them is a number preceded by a hash, for example &#42;. These correspond to just a single byte of data (code position of a character in the document character set), so they can be useful if you are trying to optimize your pages for minimum download time. These may take two forms: &#DecimalNumber; or either &#xHexadecimalNumber; or &#XHexadecimalNumber;, both of which comply with the ISO 10646 standard.

The main differences between a character and an entity are usability and download speed. But both entity references and numeric references provide a method for expressing characters that cannot easy be entered on a keyboard

According to the W3C Recommendations, HTML/XHTML supports ISO 8859-1 or Latin-1 characters, symbols, such as mathematical symbols and Greek letters and finally markup-significant and internationalization characters.

HTML/XHTML does not define an entity for every special character. For example, in ISO 8859-1 the special character & have decimal code &#38; as well as an entity value &amp;, but for @ the numeric code is &#64; but there is no special associated entity definition.

As the character encoding cannot directly represent all the characters that an author or web developer might want to include in a document, HTML/XHTML offers other mechanisms.

Note that some characters may be successfully rendered in Internet Explorer but not in Netscape Navigator 4, others in IE6 but not IE5, and yet differently in IE5 on a Mac system. Since the ASCII characters are not sufficient for all web information, HTML/XHTML uses characters set called Universal Character Set or UCS.

Web pages are files which are sent from a server as a sequence of bytes.  It is the browser’s responsibility to render this visually using a specific character encoding. The problem is that it is hard to consider exactly how a particular browser will interpret characters outside the standard ASCII set, leading to a greater probability of failures.

The character encoding is specified by the charset parameter of document’s Content-Type header.  This can also be specified inside the file using a meta declaration: for example you could write

<meta http-equiv="Content-type" content="text/html; charset=EUC-JP" />

or

<meta http-equiv=”content-type” content="text/html; charset=windows-1251">

Charset is a term which refers to the complete system for encoding a sequence of characters as numbers, which allows the storage, transmission and decoding of text. Thus a character encoding is a method of converting byte values into characters deployable in a browser.

Products such as Aggiorno offer several transformations that can you help to standardize your web pages. You can apply Fix XHTML elements compliance, where the special characters will change to numeral references or entity references according with the user selection. With a simple click non-standard characters such as Á or ñ can be translated using either named or numerical entities into XHTML-based representation.

Remember, a web document does not prescribe a behavior; this depends of the browser software implementation. Using the character’s hexadecimal reference representation is a step towards uniform behavior because these character references have been  standardized by the W3C and ISO Standards bodies

Another important step has been adoption of ISO-10646 which offers the widest possible support for human language, improving internationalization, universal accessibility and consistent rendering.

Comments

javascript
javascript us
Sunday, June 15, 2008

Css Dersleri
Css Dersleri tr
Saturday, July 05, 2008

HI i need your help i really want to create my own website/web page but i dont know how to go about doing it so can you please help me out


aburgos
aburgos cr
Tuesday, August 05, 2008

Of course, How can I help you?


Add comment



 



Country flag





download aggiorno

About Aggiorno

Aggiorno - a plugin for Visual Studio - is your instant ticket to SEO friendly, XHTML compliant, CSS styled HTML and ASP.NET! Read more on What is Aggiorno?

Recent comments

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in

Subscribe to Rss Feed