TAG SOUP: The 10 most common scenarios for shuffled tags

By Daniel Alvarez.

In a previous post about the basics of shuffled tags, we talked about different ways of looking at the same problem: placing “end tags in the wrong order”, invalid or malformed markup, or HTML “tag soup”.

Is this a frequent issue in the web? Take a look at some statistics: in a sample of 1,132 random pages; we found 22.6 % of them with at least one problem of shuffled tags.

Why are they are so common? We humans are not good at closing tags. Also, if browsers don’t care about it, why should we? Our stand is that we should care.

Moving on, a closer look at the statistics reveals that the most frequent problems are:

· table structures: people are used to implementing extremely big html structures with tables;

· div are form structures: used for grouping and to organize page layout; and

· style family: composed by tags like p, b, i, and font.

Let me show an illustration of each one of the top 10 scenarios for shuffled tags in descending order of frequency. For each scenario, I will also show an equivalent non-shuffled version.

Note: **** appear where there is (potentially complex) markup that is immaterial to the shuffling.

#1 table, tr, td

Description: the principal problem resides in the misplaced closing of the already opened tags, in the most common pattern <tr> and <table> were closed before closing <td>   

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 1 Example</title>
</head>
<body
    **** 
   <table border="true"> 
        <tr
            <td
            Texto1 
        </tr
        **** 
    </table
            </td
    ****
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 1 Solution</title>
</head>
<body
    **** 
    <table border="true"> 
        <tr
            <td
              Texto1 
            </td>
       
</tr
        **** 
    </table>
   
****
</body>
</html>

#2 div, table, tr, td

Description: in this case the example shows that <tr> was closed before closing <td> and <div> is closed before closing <table>.  

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 2 Example</title>
</head>
<body
    **** 
    <div
        **** 
        <table border="true"> 
            **** 
            <tr
                <td
                **** 
            </tr
            **** 
              </td
           **** 
    </div
        </table>
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 2 Solution</title>
</head>
<body
    **** 
    <div
        **** 
        <table border="true"> 
            **** 
            <tr
                <td
                **** 
               </td> 
            </tr
            **** 
           ****
      
</table>
    </div>
</body>
</html>

#3 font, p

Description: in this case the example shows that <font> was closed before closing <p>.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 3 Example</title>
</head>
<body
   <div
       **** 
        <font
            **** 
            <p
               **** 
        </font
              </p
        <font
            **** 
            <p
               **** 
        </font
              </p
    </div>
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 3 Example</title>
</head>
<body
    <div
       **** 
        <font
            **** 
            <p
               **** 
              </p>
        </font
        <font
            **** 
            <p
               **** 
              </p>
         </font
    </div>
</body>
</html>

Note that after fix the shuffling problem already exist illegal containment problem of p inside font, illegal containment problems would be discussed soon in another post.

#4 td, table, tr, form

Description: in this case the example shows that <tr> and <table> were closed before closing <form> and also form is closed inside another <table>.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 4 Example</title>
</head>
<body
    <table
        <tr
            <td
                <table
                    <tr
                        <form
                            **** 
                    </tr
                </table
                <table
                    **** 
                        </form
                </table
            </td
        </tr
    </table>
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 4 Solution</title>
</head>
<body
    <table
        <tr
            <td
              ****

               
<table
                    <tr
                        <form> 
                      
 </form>
                    </tr
                </table
              ****
                <table
                </table
            </td
        </tr
    </table>
</body>
</html>

Note that after fix the shuffling problem already exist illegal containment problem of form inside tr, illegal containment problems would be discussed soon in another post.

#5 ul, li

Description: in this case the example shows that <ul> was closed before closing <li>.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head
    <title>Top 5 Example</title>
</head>
<body
    <div
        <ul
            <li
                **** 
                <ul
                    <li
                        **** 
                </ul
                    </li
                **** 
            </li
        </ul
    </div>
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
html xmlns="http://www.w3.org/1999/xhtml" >
<head
    <title>Top 5 Solution</title>
</head>
<body
    <div
        <ul
            <li
                **** 
                <ul
                    <li
                        **** 
              
     </li>
                </ul
                **** 
            </li
        </ul
    </div>
</body>
</html>

#6 table, tr, td, form

Description: in this case the example shows that <tr> was closed before closing <table>, <form> and <td>.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 6 Example</title>
</head>
<body
    <table
        **** 
        <tr
            <td
                <form
                    <table
                        **** 
        </tr
        **** 
                    </table
                </form
            </td
        **** 
    </table
    ****
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 6 Solution</title>
</head>
<body
    <table
        **** 
        <tr
            <td
                <form
                    <table
                        **** 
                    </table
                </form
            </td
            <td
                <form
                    **** 
                </form
            </td
            **** 
        </tr
    </table
    ****
</body>
</html>

#7 table, tr, td, div

Description: in the first case the example shows that <td> was closed before closing <div>, and in the second example <td, <tr>, and <table> were closed before close div

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 7 Example</title>
</head>
<body
    <table
        <tr
            <td
                <div
                **** 
            </td>
            **** 
                </div
        </tr
    </table>
</body>
</html>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 7 Example 2</title>
</head>
<body
    <form
        **** 
        <table
            <tr
                **** 
                <td
                    <div
                    **** 
                </td
            </tr
        </table
                    </div
                    **** 
    </form
    ****
</body>
 
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 7 Solution 1</title>
</head>
<body
    <table
        <tr
            <td
                <div
                **** 
                </div
            </td
            ****

       
</tr
    </table>
</body>
</html>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 7 Solution 2</title>
</head>
<body
    <form
        **** 
        <table
            <tr
                **** 
                <td
                    <div
                    **** 
                   </div> 
                </td
            </tr
        </table
        **** 
    </form
    ****
</body>
</html>

 

#8 form,table,tr,td

Description: in this case the example shows that <form> was closed before closing <td>, <tr> and <table>.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 8 Example</title>
</head>
<body
    **** 
    <form
        **** 
        <table
            <tr
                **** 
                <td
    </form
    **** 
                </td
    **** 
            </tr
        </table
    ****
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 8 Solution</title>
</head>
<body
    **** 
    <form
        **** 
        <table
            <tr
                **** 
                <td
                 **** 
                </td
                 **** 
            </tr
        </table>
****

   
</
form>
</body>
</html>

#9 p, a, b.

Description: in this example <a> was closed before closing <b> and <b> was closed inside other <a>.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 9 Example </title>
</head>
<body
    <p
        paragraph 
        <a
        link 
            <b
            bolded 
        </a
        paragraph normal 
        <a
        link not bolded 
            </b
        </a
    </p>
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 9 Solution </title>
</head>
<body
    <p
        paragraph 
        <a
            link 
        </a
        <b
            <a
                bolded 
            </a
            paragraph normal 
        </b
        <a
            <b
                link not bolded 
            </b
        </a
    </p>
</body>
</html>

#10 b, i

Description: in this case the example shows that <b> was closed before closing <i>.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 10 Example</title>
</head>
<body
    <p>
        ****
        <b>
           <i
            **** 
        </b
           </i
    </p>
</body>
</html>

Solution:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head
    <title>Top 10 Example</title>
</head>
<body
    <p>
         ****
        <b
           <i
            ****
          
</i
        </b
    </p>
</body>
</html> 

Given that we want a world without shuffled tags, wouldn’t it be great to have a tool able to help us to fix all these cases across a whole file in one, automated operation? That’s the promise of Aggiorno.

Do you think there’s something I missed? Do you have another solution for any scenario of shuffled tags?

Let us know about your thoughts.

Add comment



 



Country flag