ASP/ASP.NET preserving parsing challenges

by Aggiorno Team 7. March 2008 06:45

By Cesar Muñoz

As web developers, we have all used an HTML editor at some time. Some of these editors make an extensive use of an HTML/XHTML parser. The more advanced the provided functionality, the more complex and flexible the underlying parser needs to be. If you include ASP.NET into the equation, the parsing complexity increases even more. Some of the main challenges that we have encountered when creating an ASP parser that allows for full control of the source code are discussed in this post.

Parsing ASP/ASP.NET or HTML source code is necessary to perform tasks like the following:

  • Region coloring.
  • Document content analysis.
  • Executable code analysis.
  • Problem identification.
  • Code transformation.

Some of these tasks require the parser to work on fragments of source code and preserve the following information which is normally discarded by a parser whose only goal is to run the code:

  • Tag case.
  • Literal text and spaces.
  • Line breaks.
  • Comments.
  • ASP inline code blocks.
  • Attribute and element order.
  • Element location.

If these were not enough, another complication comes from the lack of strict standards in most of HTML documents and to some extent, in ASP/ASP.NET documents. A considerable percentage of these documents are syntactically incorrect, they have missing or wrong elements, missing or wrong attribute and attribute values, as well as structural problems like overlapped or misplaced blocks of code.

All these considerations must be taken into account for a parser to be useful in an ASP/HTML development environment. Additional complications and incompatibilities between XML and HTML are mentioned by Jeff Heaton, http://www.developer.com/net/csharp/article.php/10918_2230091_1. It is also important to take into account that the correct parsing and interpretation of a document depends on the specified doctype (http://www.alistapart.com/articles/doctype/).

Now that we have listed some of the challenges, let’s explore a situation that will happen when parsing ASP/ASP.NET source code; it illustrates why a specifically developed parser is necessary.

ASP controls with mixed content

New ASP.NET control tags can define nested sections that contain normal HTML tags. This type of structure has the following pattern:

     <aspTag>

          <section1>

               HTMLContent1

          </section1>

          <section2>

               HTMLContent2

          </section2>

          …

          <sectionN>

               HTMLContentN

          </sectionN>

     </aspTag>

If HTMLContent1, HTMLContent2, etc are independent they can be parsed and processed without additional preprocessing. This is not the case when there are dependencies between HTMLContent1, HTMLContent2, for example:

     <asp:Repeater runat="server">

          <HeaderTemplate>

               <table>

          </HeaderTemplate>

          <ItemTemplate>

               <tr><td>

               <%# DataBinder.Eval(Container.DataItem, "Title") %>

               <hr>

               <%# DataBinder.Eval(Container.DataItem, "Abstract") %>

               </td></tr>

          </ItemTemplate>

          <FooterTemplate>

               </table>

          </FooterTemplate>

     </asp:Repeater>

This type of dependency breaks the XML scheme and will make a normal XML or HTML parser fail.

Solution strategy

The solution involves one preparation step and additional considerations when working with the parser output.

Preparation step

The section opening and closing tags will be flattened and converted to a special tag (UnknownTag) that can be identified by the parser component users. In this way the different HTML content sections are reunited for parsing and transformation purposes.

Example:

[SOURCE]

     <asp:Repeater runat="server">

          < HeaderTemplate>

               <table>

          </HeaderTemplate>

          < ItemTemplate>

               <tr><td>Hello world!</td></tr>

          </ ItemTemplate>

          <FooterTemplate>

               </table>

          </FooterTemplate>

     </asp:Repeater>

[PREPARED]

     <asp:Repeater runat="server">

          <UnknownTag “@AISHeaderTemplate” />

               <table>

          <UnknownTag “@AISHeaderTemplateClose” />

          <UnknownTag “@AISItemTemplate” />

               <tr><td>Hello world!</td></tr>

          <UnknownTag “@AISItemTemplateClose” />

          <UnknownTag “@AISFooterTemplate” />

               </table>

          <UnknownTag “@AISFooterTemplateClose” />

     </asp:Repeater>

This flattened structure can be parsed and the result can be the input for other processes.

 

Transformation considerations

The tags generated in the preparation step must be ignored by all transformations and must remain in the final transformation result.

Pretty-printing step

The tags generated in the preparation step will need to be restored to the original tag.

We have seen what types of problems will need to be considered when writing and using a parser for real-life, probably incomplete and XML incompliant ASP and HTML documents. In future discussions we will consider more specific problems like error recovery.

Happy parsing!

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

Comments

Add comment


(Will show your Gravatar icon)  

  Country flag

biuquote
  • Comment
  • Preview
Loading



About Aggiorno

Aggiorno RSS FeedsAggiorno is a unique knowledge-encapsulation platform that can make any website a valid, findable, accessible, standards compliant one. Read on

IE8 Compatibility Wizard

Automatically upgrades your website to render correctly in IE8!

Internet Explorer 8 Compatibility Wizard

Get it today!

RecentComments

Comment RSS

Calendar

<<  March 2010  >>
MoTuWeThFrSaSu
22232425262728
1234567
891011121314
15161718192021
22232425262728
2930311234

View posts in large calendar

Disclaimer

The opinions expressed here in are my own personal opinions and do not represent my employer's view in anyway

Copyright 2008


ArtinSoft Corporation ArtinSoft is Microsoft Certified Partner ISV/Software Solutions and Microsft Visual Studio Partner

With over fifteen years of experience, ArtinSoft has proven to be a key player in software evolution, by allowing customers from all over the world to ensure business continuity and compliance through software migration solutions and developer tools created upon principles of artificial intelligence. At present time, ArtinSoft Corporation remains a private firm in constant growth through a strategic partner network. Read More...