By David Alfaro
What do I mean by “shuffled
tags”? It’s my shorthand for what some authors have called Invalid Markup or Malformed markup
as a cause of Tag soup. More
precisely, it refers to placing “end
tags in the wrong order”, as stated in the HTML TIDY documentation. For
example, consider this fragment:
<p>
here is a para
<b> bold <i> bold italic</b> bold?</i> normal?
</p>
At a glance, you can see that the end tag
</b> appears where
</i> should and,
conversely, the end tag
</i> is where the
</b> should be placed.
Well, well, not so bad. But remember that anything between <i> and </i>
is in italics and anything between <b>
and </b> is in
bold. That’s the way browsers render it, at least in Internet Explorer 7 and
Firefox 2 . IT DOESN’T MATTER THAT <i>…
</i> AND <b>…
</b> ARE OVERLAPPED! That
is, I will take THIS specific behavior as consistent between IE7 and FF2. I’ll
assume that THIS specific browser rendering is what the “web developer”
understood and accepted.
In the aforementioned HTML TIDY documentation, you will find
the latter shuffled tag example is supposedly corrected to:
<p>
here
is a para
<b>bold <i>bold italic</i> bold?</b> normal?
</p>
Hmmm, what do you think? Hmmm, wait a minute! Something is wrong. Let’s see the behavior in
Internet Explorer 7:
Source:
“Corrected”:
I knew it! Remember our today’s mantra: “anything between <i> and </i> is in italics and anything
between <b> and </b> is in bold: so, it doesn’t
matter if they are overlapped”.
A more realistic solution is:
<p>
here
is a para
<b>bold
</b><i><b>bold italic </b>bold? </i>normal?
</p>
This solution is a little bit verbose, isn’t it? Well, yes,
but this solution expresses author’s intention using the semantics of HTML/XHTML.
Thus, you rely on standards rather than relying on browser interpretation. Does
it really matter? Undoubtedly! Search bots
don’t necessarily read pages exactly as browsers do. In this context, relying on standards is
always a safer bet. Ask Chris
Maunder of CodeProject about his experiences with Google when it comes to
reading and indexing pages related to HTML syntax and DOCTYPE.
[Search bots don’t necessarily read pages exactly as browsers do. In this context, relying on standards is
always a safer bet]
How many kinds of “shuffled tags” exist? Maybe a more useful
question is: Which kinds of “shuffled tags” are more popular in the web? What
are their solutions? We are preparing a serious study about it and we’ll post
it very soon. Stay tuned. [Update: A new post about the top ten shuffled tags scenarios]
So far, my “realistic solution” is for me the best solution
regarding HTML syntax. Do you agree with me? Do you have a better solution? Please
let me know; we depend on your feedback to create a better web.
Your comments are
welcome!