Final thoughts on XHTML and HTML

This is our seventh and last post on XHTML/HTML.  With this post we want to draw some conclusions with regards to all the information that was provided.

HTML and XHTML offer very similar functionality in terms of describing and marking up documents for the web.  XHTML has a number of advantages in terms of its interoperability with other markup documents, and its consistent syntax.

A few years ago, it seemed clear that XHTML would be the “future of the web”, but more recently, HTML has grown in popularity, as browser support for XHTML has often not kept pace with developments. The competition on who is going to be more popular is still open. even if we believe XHTML has some definite advantages.

But if you’re ready to make the transition from HTML to XHTML, you’ll want to check out Aggiorno – a plug-in for Microsoft Visual Studio that has embedded knowledge about the differences between HTML and XHTML – to make your transition easier.  It automatically targets XHTML 1.0 Transitional documents and makes sure your pages are error-free and up-to-date before going on to offer additional improvements, such as improving accessibility, automatically upgrading table layouts to use CSS and extracting master pages from sites with similar formatting.

Inconsistencies between HTML and XHTML

This is the sixth installment of our XHTML/HTML series. So now for the bad news.  In addition to the syntactic differences, HTML and XHTML do not share the exact same semantics, and there are things you should watch out for.  In particular, HTML assumes that it is rendering to a browser, and “takes over” the browser window.  The XHTML model is to assume that it is rendering to a specific target area.  The consequence is that CSS styling (for example, backgrounds) for XHTML applies generally only to the area where there is content, while for HTML, the background applies to the entire window.

At the same time, there are differences in parsing the XHTML file.  Because XHTML is first and foremost an XML document, it is generally processed as such before any of the content is considered.  This has two significant consequences.  Firstly, anything placed within a comment block is likely to be completely ignored, as comments will be thrown away during parsing.  Since comments are often used in HTML to “hide” unwanted items such as stylesheets and JavaScript documents, this can have serious consequences.  Secondly, those same elements will be “parsed” by the XHTML parser if they are not within comment blocks, and that may lead to parsing errors.  Since it is a requirement of all XHTML documents that they be well-formed XML documents, or they will be rejected by the browser, any such problems will cause the whole page to be rejected.  To solve this problem, it is necessary to place stylesheets and JavaScript within CDATA sections in your XHTML file.

There are a number of other inconsistencies relating to the functioning of JavaScript within XHTML, for example document.write() does not work in the same way, since the document has been fully parsed by the time the call is made: instead, it is necessary to directly manipulate the DOM of the page.

In summary, XHTML has a more strict format that can be checked by development tools and can be fixed before it is sent out to a browser. HTML is a bit more forgiving. Browsers accept ill-formed HTML. This is a good thing from the perspective of tolerance with programmers mistake, but it is a bad thing as browser interpret the ill-formed code in different ways.

XHTML and Domain Specific Languages (DSLs) and Stylesheets.

This is the fifth installment of our XHTML/HTML series. Extending the notion of using other, W3C-approved XML, extension languages along with XHTML is the idea of completely replacing the XHTML content with simpler XML content.  Domain Specific Languages (http://en.wikipedia.org/wiki/Domain_Specific_Language) are generally computer languages which have been specifically designed with solving problems in a specific domain.  The notion is that frequently-expressed concepts work their way into the statement of the problem, rather than being expressed in the solution to the problem.

The same approach can be used with XHTML.  Because it is possible to use a stylesheet to transform any tree within an XML document, and because so much of what makes a web page is boilerplate, it is possible to extract the boilerplate into a stylesheet and then use that to define a domain-specific language.

A Simple Example

A simple example should suffice to demonstrate this approach. Imagine that you are running a bookstore, and you have information about your books stored in XML somewhat as follows:

<book>
      <title>Arthur's Tree House</title>
      <author>Marc Brown</author>
      <coverImage>8037.jpg</coverImage>
      <price currency=”USD”>399</price>
</book>

By simply adding the following two lines to the top of this file (and providing all the referenced stylesheet information), you can automatically transform this into a complete XHTML page:

 

<?xml version="1.0" encoding="utf-8"?>

<?xml-stylesheet type="text/xsl" href="BookClub.xslt"?>

This works because the browser knows when downloading an XML document with the xml-stylesheet processing instruction to automatically apply the specified stylesheet before attempting to display the page.  The stylesheet generates XHTML output, which the rendering engine knows how to display in the browser.  The CSS styling (attached within the XSL stylesheet), provides the style, completely separated from the content.

Did I lose you?

Apologies if that didn’t seem as simple as the title suggested.  The principle is that creating the content pages should be simple – because much of the complexity has been sucked up into the XSL stylesheet.  And the mechanism for triggering this – identifying the stylesheet within the file – is also simple.

Writing XSL transformations themselves, however, is not so simple.  But it does hold great advantages when you have truly domain-specific information, as in this case, which you might want to use in numerous different ways in a single site.  For example, you might want to display books in each of the following contexts:

  • On their own page, where all the information about the book is shown;
  • On a recommendations page, with just the cover and the title;
  • In a list (such as the shopping cart) without the cover image.

And in each context, exactly the same XML file could be used, but with a different XSL stylesheet on the page to make a different transformation to different XHTML.

In Summary

The power of XML, specifically with regard to XSL stylesheets, can be leveraged within XHTML to reduce the complexity of making individual documents and to promote reuse.  The value of this technique depends on the other elements that collaborate to form your complete system.

XHTML is one language family

This is the fourth installment of our XHTML/HTML series.  One of the advantages of XML as a markup language is that it is extensible: that is, it is possible to define new markup within the context of the overall markup language.  XHTML is one such example of an XML-constructed markup language, but there are others likes the ones we describe below.  HTML offers none of these facilities.

MathML

Mathematical notation is a particularly challenging task for typesetting in a meaningful way.  Especially difficult is to be able to simultaneously describe the semantics of the mathematical expression while providing sufficient information about how the author wants it to be laid out on the page.  It is the same challenge of content and style that exists in HTML.

MathML addresses this challenge by allowing an author to be very precise about the symbols to be used, the interpretation that should be applied to each symbol, and the structure that builds them up into a precise mathematical language.

MathML documents can be embedded directly into an XHTML document using an appropriate stylesheet and namespace.  See for example http://www.w3.org/Math/testsuite/mml2-testsuite/index.html from the MathML test suite.

SVG

SVG is a language for describing scalable vector-based graphics from within an XML document.  This allows pictures, drawings and schematics to be directly nested within an XHTML document without requiring additional resources such as large image files to be downloaded from the server, although it is possible to include smaller images within an SVG file.

Uses of SVG include maps sent to mobile phones, schematics of web sites, and, ultimately, any web-based graphical experience.  Because the SVG forms part of the document structure, it can be modified interactively using JavaScript, thus updating the image locally without needing to interact with the server.

In Summary

XHTML provides more than just the opportunity to write HTML documents.  Because it is based on an extensible infrastructure, it is possible to extend the supported languages and create multipurpose files.

Strong Interpretation of XHTML

This is the third installment of a series of blog posts to discuss HTML, XML and XHTML.  

A major difference between HTML and XHTML is the manner in which they are interpreted by the browsers.  HTML and the browsers having grown up together, browsers tend to be very tolerant of poorly written or outdated HTML constructs.  XHTML, on the other hand, is treated very strictly by browsers.

How serious are you about Web standards?

Although it’s just as possible to write high quality HTML as it is to write high quality XHTML, it can be harder to know that you’ve written high quality HTML.  Because the browsers “gloss over” many of the problems in HTML code, it often seems that what you’ve written is good HTML.  With XHTML this is not the case – browsers reject invalid XHTML without question.

Getting Help to Get There

Of course, in either HTML or XHTML there are tools that can help you.  The W3C validator (http://validator.w3.org/) will check any of the six possible schemas of HTML or XHTML.  Microsoft’s Visual Studio environment will include warning messages about invalid usages of your selected schema.  And Aggiorno (http://www.aggiorno.com) will not only check your documents, it will automatically fix structural and deprecation problems and bring your files up to full XHTML 1.0 compliance.

In Summary

XHTML offers stronger compliance rules with browsers than HTML, along with better tool availability for enforcing that compliance.

Common Objections to Refactoring (they're wrong - of course!)

"We don’t have the time to waste on cleaning up the code. We have to get this feature implemented now!"  Have you ever heard such a comment?  Have you ever made such a comment?  Chances are that you probably have both heard and made the comment!  Well this is probably one of the most common objections to Refactoring as it is explained in the book Refactoring HTML by Elliotte Rusty Harold and summarized in the following series of posts: The Cafes » Objections to Refactoring.

But let's start from the beginning.  What is a Refactor?  As stated on Wikipedia, in software engineering, "refactoring" a source code module often means modifying without changing its external behavior, and is sometimes informally referred to as "cleaning it up".  A refactor is a source code transformation that does not alter the behavior of the program.  Historically refactorings have been applied to make the code more maintainable and have been almost within the exclusive realm of sophisticated programmers.  In recent years IDEs like Visual Studio.NET have started to support refactoring and the term is starting to become part of the standard programmers language.  However, the concept of a refactoring has only been applied to small localized changes in the source code.  Changes that somewhat improve code  maintainability or that perform some modification in an automated manner that help the programmer increase its productivity while editing code.  Of course refactorings are a good thing, and in general if you can improve the maintainability of your code you should, specially when you know that your code has to continue to evolve.

When we were designing Aggiorno the first way of referring to what Aggiorno would do was: "Refactoring for the Web".  However, we soon realized that even if Aggiornings are similar to refectorings in the sense that they perform transformations on code they are different in their purpose and this is why we coined the name.

Aggiornings are a form of encapsulated knowledge aimed at providing business value as directly as possible.  For instance, an aggiorning will help you make your page XHTML compliant (web standards).  And we already know from previous posts that making a page standard helps you with SEO, accessibility, maintainability, etc.  Another scenario where aggiornings help is when you want to separate content from style.  The immediate business benefit is that maintainability costs go down, but also, it is much easier to port the code to mobile browsers and it is also easier for google to index your page and not get confused by cluttered markup.  Aggiornings might or might not preserve equivalence in the code they transform (refactorings always do) and even if we preserve equivalence when it makes sense to do so, we also make other changes that produce a big return on investment even if they do not preserve equivalence.  Another big difference about aggiornings is that the user can always request an explanation and aggiorno will show exactly where the changes happened so the user can ultimately decide if he wants to go trough with them or not.

Going back to the title of this post, source code improvements that are aimed at improving maintainability are in general a very good thing to do.  If you look at refactorings from the perspective of aggiornings the code improvements are even more important.  We are talking about improvements whose ROI can immediately be measured from the business perspective.  There really should be no objections to using refactoring techniques or aggiornings on your code.  If you have an objection, let me know!

ahh... and let me encourage you to download aggiorno Beta and explore aggiornings for yourself.

How Similar are HTML and XHTML?

This is the second installment of a series of blog posts to discuss HTML, XML and XHTML.  An often asked question is “What are the differences between HTML and XHTML?”  But actually, a much more interesting question is “What do HTML and XHTML have in common?”

It’s all about the content

HTML and XHTML are both content markup languages.  That is, in both cases, the most important thing is that they are describing some content being used for communication.  The content is marked up so that computer software – especially browsers – can determine the significance of the content and render it appropriately.

HTML and XHTML have the same rules about content, and the same markup constructions.  Because of its history, HTML allows some outdated markup to be accepted which XHTML does not.

They delegate styling to CSS

Both HTML and XHMTL delegate the styling of pages to CSS.  The same CSS.  There isn’t one CSS for HTML and another for XHTML, although a few of the rules are interpreted slightly differently in the two cases.

They are W3C standards

Both HTML and XHTML are technical recommendations from the W3C – so called “Web Standards”.  Theoretically, any valid HTML or XHTML document should be accepted by all “user agents” out there – from browsers and screen readers to search engine bots.

They have similar appearance

Both HTML and XHTML use SGML as a reference base, and as such, they look very similar, especially when you consider that the valid markup tags in the two languages are the same.  There are superficial syntactic differences, particularly in terms of HTML’s tolerance for tags in different cases, and XHTML’s stricter rules on closing tags.

In Summary

HTML and XHTML are very similar in purpose, intent, layout and appearance.  XHTML has a more consistent syntax, while HTML can be shorter and is more tolerant.

Accessibility Checklist and Web Standards

NorthTemple.com recently published a very comprehensive Accessibility Checklist that you need to follow for your web site.  The original post can be found here: NorthTemple.com : The Accessibility Checklist I V... .

The 30 items on the list (excluding the ones related to testing) can be classified in two different categories:

1) Considerations that require visual appreciation or understanding of the content to be enforced (19 items).

2) Considerations that are a property of the source code and do not require understanding of the subject matter (11 items).

The first category can be summarized as to make sure we write good content, that we have good description for things that are audiovisual and that we do not use any non accessible resource to guide the user through the navigation of the site.  In general just keep in mind who your intended audience is.

On the other hand, the second category of consideration revolves completely around what I call properties of the source code, things that are coded in a way that can or cannot be accessible.  For instance, make sure that all pages have a title, that all images have an alternate description, that form fields are ordered in the right way (tab index), that content and style are separated, etc.  All of these considerations are automatically found AND fixed by Aggiorno, and believe me, finding and fixing these things manually IS a drag and this is the reason why many web developers don't do it.

It is also quite interesting to see how one of the 4 testing considerations to certify that a page is accessible is to make sure that the page is web standards compliant.  Aha!  WEB STANDARDS!

Once again, it is clear that following web standards certainly helps in making a page accessible.  I would venture to say that making a page web standards compliant is the FIRST step that needs to be done, in fact, if you do nothing else I strongly suggest you make the page standards compliant and that you allow Aggiorno to help you in the process, you'll save tons of time.  Download Aggiorno Beta now!

You've been aggiorned, Mr. Kothari!

Going around the web for web sites to test I came along the ever useful blog of Nikhilk Kothari, author of wonders such as Script# or Facebook.NET. I knew Nikhilk was fond of things like CSS or Silverlight, so we reckoned the page was pristine in terms of markup.

See for yourself what I found when downloading his home page and putting it through a dose of Aggiorno:

 

I then showed the video to Nikhilk, who found it really funny and told us that it's been quite a while since he last looked at the base source code (2003 he said!).

Similar to Nikhilk's, there are tons of pages that were coded awhile and never looked back at.

Does your markup validate? If not, you could be missing public by not being accessible and sucking at SEO!

As shown in the video, with Aggiorno you can take that legacy code and, after a couple of clicks, get on the bandwagon of web standards. 

Remember that Aggiorno works on XHTML, ASP.NET and PHP source code, so no excuses from you, sir: come download Aggiorno and give it a try!

Aggiorno Beta 2 officially released at TechEd

ArtinSoft launched this morning the Beta 2 version of Aggiorno during TechEd.  The full press release can be viewed at: http://www.microsoft.com/presspass/events/teched/docs/artinsoft2.doc

Here's an excerpt: "

ORLANDO, Fla. – June 3 2008 – Today Artinsoft Corp. announced the release of the second public beta of Aggiorno (www.aggiorno.com ), an extension to both Microsoft Visual Studio 2005 and 2008, through the Visual Studio Integration Program (VSIP), that brings expert knowledge and productivity to web developers.  With Aggiorno, web developers easily make their ASP.NET or HTML sites compliant with the latest web standards using the latest technology trends suggested by industry experts and immediately incorporating business value. "

We are getting closer and closer to a final release and everyday we are more excited about the functionality of the product and its value for web developers.

 


download aggiorno

About Aggiorno

Aggiorno - a plugin for Visual Studio - is your instant ticket to SEO friendly, XHTML compliant, CSS styled HTML and ASP.NET! Read more on What is Aggiorno?

Recent comments

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in

Subscribe to Rss Feed