Aggiorno stand alone?

We have heard you!  The Beta program for Aggiorno is very successfull and we have received lots of feedback on different aspects of the product.

Something that has come up over and over again has been a request to also make Aggiorno available as a stand alone tool and not only as a Visual Studio add in.

Well... we've heard you.  We are already at work in preparing a stand alone version.  This version will be released on the Visual Studio Shell and will be available shortly after Aggiorno for Visual Studio 1.0 releases (by the way... this will happen soon, very soon!).

For all of you who are interested in how Aggiorno can help you simplify your daily work as web developers but who do not use Visual Studio, please stay tuned as we will be making announcements soon!

 

Get a Chance at Google with Aggiorno

As web users and web developers we are constantly attributing human qualities to the different actors of the Internet.  In my mind Google has always been the sexy, out of reach girl that we're constantly trying to impress.  When courting a girl there is a well defined protocol that needs to be respected.  You have to be polite, tactful, respectful (... it seems I am listening to my mom...), in any case there are rules that need be followed when your are trying to impress a girl and these rules go well beyond appearances.

As web developers we tend to forget some of the rules that need to be followed to make our sites more findable, more accessible, more secure, more maintainable... we typically only care about how do our pages look like in the common browsers without paying to much attention to the inner details.  You can have great content but if your markup sucks you will have issues when trying to conquer important actors like Google.

Last week I wrote a post on how the lack of use of web standards can affect your SEO efforts.  Lot's of small details that can really turn Google off.

At Aggiorno, the team is on a death march towards the release of V1.0 (soon... very soon...) and we need to relief some stress and at the same time try to educate more about the importance of good markup, the importance of following web standards on our daily work.  We came up with a video called "Get a Chance at Google" that enacts an encounter between a very content intensive web site with ... some issues...

Take a look at the video and share it if you like it.  Also, let us know what you think and if you have more ideas so this can become its own series!

Enjoy!

Web Standards and Search Engine Optimization (SEO) -- Does Google care about the quality of your markup?

There are many discussions on the web regarding the merits of using web standards. 

The argument against using web standards can be summarized as: "who cares?!"  If the browsers render my code correctly then I am accomplishing my goal.

The arguments in favor of using web standards can be summarized as providing improved cross browser compatibility and  improved maintenance cost from using cleaner code.

I believe there is an important point that is only tangentially discussed and that should be addressed much more emphatically.  What does Google do when it encounters non standard compliant HTML?  Does it affect your search results?  We need to constantly remember that search engine bots are "users" of our sites and they are not necessarily as tolerant with our markup as normal browsers.  A SEO expert gave me the best trick to understand what Google sees and what it doesn't see while going through a page.  It goes like this:

  • Open the page you want to evaluate in your favorite browser
  • Click Select All
  • Click Copy
  • Open NotePad
  • Click Paste

Whatever gets printed in NotePad is what Google is indexing.

But lets go into some examples of how the wrong HTML can affect the results of your search.

Missing alternate descriptions

First example of how following standards can improve your interaction with Google.  Google is "blind".  Google only sees the text that is embedded in your page; no images, no java script, no animations... One of the rules that are required by standards is to provide an alternate description for non textual information (alt attribute in XHTML).  If you do not include an alternate description you are missing an opportunity to provide information to Google.  The use of ALT descriptions is a best practice enforced by using web standards that affect search results!

Wrong / Missing DOCTYPE

The second example relates to the use of DOCTYPE.  The DOCTYPE is used to specify to the browser what kind of markup to expect.  Is it HTML?  Is it XHTML? No DOCTYPE? (Google replies: well... let's guess).   And the last thing you need is for Google (or any browser for that matter) to guess how to interpret your source code.  Chris Maunder from The Code Project has an excellent example of how Google can get confused if you specify a certain DOCTYPE and then you write code following a different standard.  In certain cases Google simply stops indexing the page and it assumes it was a 404 Page Not Found error.  The example that Chris shows reflects how a simple miss-closed tag (ultimately a missing "/") can avoid the indexing of a page.    Syntax correctness, which is enforced by using web standards, is important when you try to have Google index your page! UPDATE: In general this is an example of the Tag Soup problem.  The right thing to do is to make sure your web site validates according to a standard like XHTML transitional.

Lack of / Incorrect use of Entities

Ahh... entities...  Isn't it painful having to follow the rule set by web standards and escape every special character?  Well, it might be painful but Google reacts to non escaped characters in very peculiars ways.  Let's first look into the most obvious one.  If you write in a foreign language that requires characters with an accent like for example Spanish, French or Italian depending on how you code your information with entities the search results may vary

Second, there are issues with escaped vs unescaped characters in URLs.  This webmasterworld article is an example of how wrong usage of entities can cause confusion.

Third, when you use scripting to generate markup, the way in which you write your script can also confuse Google as Chris Maunder also explains in his article.  If you try to generate code without escaping the right characters you can get in trouble.  Web Standards enforce the proper use of entities, another reason to follow them to avoid search engine confusion.

Missing required page elements

There are a number of attributes in a page that are either required or recommended by web standards that can definitely increase or decrease your page rank.  One of the suggestions that many SEO experts have is to make sure a page contains at least the following attributes:

h1: Every page should have one and only one h1.   This tag should be used to express the main idea described in the page.  In general heading tags should not be used only for styling but to semantically mark the content in the page.  Google pays special attention to h1 content when indexing.

title: Every page should have one and only one title.  The title should be related to h1.  Google looks at the relationship between h1 and title when indexing.

meta tags: Every page should have a number of meta attributes (description, keywords, etc.).  These keywords are taken into account by Google while indexing and they also provide semantic information about the page that when properly used can improve the user experience while surfing the web.

Again, web standards remind you of the proper usage of these attributes and therefore can help you improve your search results.

Separation between Content and Style

Web standards teach you about separation between content and style which is an incredibly useful practice per se with regards to improving maintainability.  It also clearly has some advantages with respect to Google behavior.  The first one is bandwidth savings.  If your styling information is in a separate css file, since Google does not care about style, then it will now crawl it and therefore you will not be spending bandwidth in this manner.  But in addition to bandwidth savings (which can be major for high trafficked sites), there is a limit to the size of a page that is indexed by search engines.  So, if your page is not "polluted" by styling then it can have more content!   Additionally, if your style contains syntax errors it can confuse Google and this is a way to avoid it.   UPDATE: A very good practice is to avoid HTML tables as a mechanism to layout the information on a table.  This should be done using style markup (CSS).

Web standards practices help you direct your efforts with respect to this separation.

Unmarked text: no semantics

Many times web developers simply compy and paste text into a web page.  The resulting markup is basically just text separated with BRs.  As of today I do not believe search engines penalize this behavior, but moving forward it will be more and more important to make sure every piece of text contains as much semantic as pssible.  For now, the minimum semantic that a piece of text should contain is basic HTML markup like P, UL, Hx, etc.  This information can help search engines understand the priority and context of the content.  Additionally, unmarked txt is very hard to style and maintain therefore it is a good practice anyways.  UPDATE: There are some newer standards like microformats that can add semantic information to a page without affecting the rendering of the information.  Even if at this moment it is not clear how microformats will affect search results presumably they will be important in the near future.

Conclusions

It is clear from the examples above that not following web standards can have a huge impact on your search results!  From not providing the best information to index a page to Google not indexing a page at all because of syntax errors in the markup, even if the page looks good in the browsers!  UPDATE: Aarron Walter just published a very good findability strategy checklist that has a complete section on markup and additional sections on server and client side code.

It is true that you can avoid most of the mistakes shown here without the need to completely follow web standards, but they are super useful as a guideline and as best practices to follow when programming web pages.  Next time you look at your page  you can have Aggiorno by your side helping you with all the time-consuming tasks necessary to make a page XHTML compliant.

With Aggiorno we are promoting web standards by eliminating a lot of the tedious work that is required to make a page validate.  By doing so we are helping pages improve their stance towards search engines.  In particular:

Aggiorno can help you find missing alternate descriptions  

Aggiorno can help you make your code structure XHTML compliant

Aggiorno can help you convert special characters into appropriate entities

Aggiorno can help you with content-style separation

Aggiorno can help you with text semantication

 

Aggiorno Release Candidate 0 (RC0) is now available

We just uploaded Aggiorno RC0 to our site. 

We are getting closer every day to a final polished release that will make us proud.

This version fixes a number of small imperfections.  You feedback is invaluable, please give Aggiorno a try and let us know what you think.

 

 

Tags:

Final thoughts on XHTML and HTML

This is our seventh and last post on XHTML/HTML.  With this post we want to draw some conclusions with regards to all the information that was provided.

HTML and XHTML offer very similar functionality in terms of describing and marking up documents for the web.  XHTML has a number of advantages in terms of its interoperability with other markup documents, and its consistent syntax.

A few years ago, it seemed clear that XHTML would be the “future of the web”, but more recently, HTML has grown in popularity, as browser support for XHTML has often not kept pace with developments. The competition on who is going to be more popular is still open. even if we believe XHTML has some definite advantages.

But if you’re ready to make the transition from HTML to XHTML, you’ll want to check out Aggiorno – a plug-in for Microsoft Visual Studio that has embedded knowledge about the differences between HTML and XHTML – to make your transition easier.  It automatically targets XHTML 1.0 Transitional documents and makes sure your pages are error-free and up-to-date before going on to offer additional improvements, such as improving accessibility, automatically upgrading table layouts to use CSS and extracting master pages from sites with similar formatting.

Inconsistencies between HTML and XHTML

This is the sixth installment of our XHTML/HTML series. So now for the bad news.  In addition to the syntactic differences, HTML and XHTML do not share the exact same semantics, and there are things you should watch out for.  In particular, HTML assumes that it is rendering to a browser, and “takes over” the browser window.  The XHTML model is to assume that it is rendering to a specific target area.  The consequence is that CSS styling (for example, backgrounds) for XHTML applies generally only to the area where there is content, while for HTML, the background applies to the entire window.

At the same time, there are differences in parsing the XHTML file.  Because XHTML is first and foremost an XML document, it is generally processed as such before any of the content is considered.  This has two significant consequences.  Firstly, anything placed within a comment block is likely to be completely ignored, as comments will be thrown away during parsing.  Since comments are often used in HTML to “hide” unwanted items such as stylesheets and JavaScript documents, this can have serious consequences.  Secondly, those same elements will be “parsed” by the XHTML parser if they are not within comment blocks, and that may lead to parsing errors.  Since it is a requirement of all XHTML documents that they be well-formed XML documents, or they will be rejected by the browser, any such problems will cause the whole page to be rejected.  To solve this problem, it is necessary to place stylesheets and JavaScript within CDATA sections in your XHTML file.

There are a number of other inconsistencies relating to the functioning of JavaScript within XHTML, for example document.write() does not work in the same way, since the document has been fully parsed by the time the call is made: instead, it is necessary to directly manipulate the DOM of the page.

In summary, XHTML has a more strict format that can be checked by development tools and can be fixed before it is sent out to a browser. HTML is a bit more forgiving. Browsers accept ill-formed HTML. This is a good thing from the perspective of tolerance with programmers mistake, but it is a bad thing as browser interpret the ill-formed code in different ways.

XHTML and Domain Specific Languages (DSLs) and Stylesheets.

This is the fifth installment of our XHTML/HTML series. Extending the notion of using other, W3C-approved XML, extension languages along with XHTML is the idea of completely replacing the XHTML content with simpler XML content.  Domain Specific Languages (http://en.wikipedia.org/wiki/Domain_Specific_Language) are generally computer languages which have been specifically designed with solving problems in a specific domain.  The notion is that frequently-expressed concepts work their way into the statement of the problem, rather than being expressed in the solution to the problem.

The same approach can be used with XHTML.  Because it is possible to use a stylesheet to transform any tree within an XML document, and because so much of what makes a web page is boilerplate, it is possible to extract the boilerplate into a stylesheet and then use that to define a domain-specific language.

A Simple Example

A simple example should suffice to demonstrate this approach. Imagine that you are running a bookstore, and you have information about your books stored in XML somewhat as follows:

<book>
      <title>Arthur's Tree House</title>
      <author>Marc Brown</author>
      <coverImage>8037.jpg</coverImage>
      <price currency=”USD”>399</price>
</book>

By simply adding the following two lines to the top of this file (and providing all the referenced stylesheet information), you can automatically transform this into a complete XHTML page:

 

<?xml version="1.0" encoding="utf-8"?>

<?xml-stylesheet type="text/xsl" href="BookClub.xslt"?>

This works because the browser knows when downloading an XML document with the xml-stylesheet processing instruction to automatically apply the specified stylesheet before attempting to display the page.  The stylesheet generates XHTML output, which the rendering engine knows how to display in the browser.  The CSS styling (attached within the XSL stylesheet), provides the style, completely separated from the content.

Did I lose you?

Apologies if that didn’t seem as simple as the title suggested.  The principle is that creating the content pages should be simple – because much of the complexity has been sucked up into the XSL stylesheet.  And the mechanism for triggering this – identifying the stylesheet within the file – is also simple.

Writing XSL transformations themselves, however, is not so simple.  But it does hold great advantages when you have truly domain-specific information, as in this case, which you might want to use in numerous different ways in a single site.  For example, you might want to display books in each of the following contexts:

  • On their own page, where all the information about the book is shown;
  • On a recommendations page, with just the cover and the title;
  • In a list (such as the shopping cart) without the cover image.

And in each context, exactly the same XML file could be used, but with a different XSL stylesheet on the page to make a different transformation to different XHTML.

In Summary

The power of XML, specifically with regard to XSL stylesheets, can be leveraged within XHTML to reduce the complexity of making individual documents and to promote reuse.  The value of this technique depends on the other elements that collaborate to form your complete system.

XHTML is one language family

This is the fourth installment of our XHTML/HTML series.  One of the advantages of XML as a markup language is that it is extensible: that is, it is possible to define new markup within the context of the overall markup language.  XHTML is one such example of an XML-constructed markup language, but there are others likes the ones we describe below.  HTML offers none of these facilities.

MathML

Mathematical notation is a particularly challenging task for typesetting in a meaningful way.  Especially difficult is to be able to simultaneously describe the semantics of the mathematical expression while providing sufficient information about how the author wants it to be laid out on the page.  It is the same challenge of content and style that exists in HTML.

MathML addresses this challenge by allowing an author to be very precise about the symbols to be used, the interpretation that should be applied to each symbol, and the structure that builds them up into a precise mathematical language.

MathML documents can be embedded directly into an XHTML document using an appropriate stylesheet and namespace.  See for example http://www.w3.org/Math/testsuite/mml2-testsuite/index.html from the MathML test suite.

SVG

SVG is a language for describing scalable vector-based graphics from within an XML document.  This allows pictures, drawings and schematics to be directly nested within an XHTML document without requiring additional resources such as large image files to be downloaded from the server, although it is possible to include smaller images within an SVG file.

Uses of SVG include maps sent to mobile phones, schematics of web sites, and, ultimately, any web-based graphical experience.  Because the SVG forms part of the document structure, it can be modified interactively using JavaScript, thus updating the image locally without needing to interact with the server.

In Summary

XHTML provides more than just the opportunity to write HTML documents.  Because it is based on an extensible infrastructure, it is possible to extend the supported languages and create multipurpose files.

Strong Interpretation of XHTML

This is the third installment of a series of blog posts to discuss HTML, XML and XHTML.  

A major difference between HTML and XHTML is the manner in which they are interpreted by the browsers.  HTML and the browsers having grown up together, browsers tend to be very tolerant of poorly written or outdated HTML constructs.  XHTML, on the other hand, is treated very strictly by browsers.

How serious are you about Web standards?

Although it’s just as possible to write high quality HTML as it is to write high quality XHTML, it can be harder to know that you’ve written high quality HTML.  Because the browsers “gloss over” many of the problems in HTML code, it often seems that what you’ve written is good HTML.  With XHTML this is not the case – browsers reject invalid XHTML without question.

Getting Help to Get There

Of course, in either HTML or XHTML there are tools that can help you.  The W3C validator (http://validator.w3.org/) will check any of the six possible schemas of HTML or XHTML.  Microsoft’s Visual Studio environment will include warning messages about invalid usages of your selected schema.  And Aggiorno (http://www.aggiorno.com) will not only check your documents, it will automatically fix structural and deprecation problems and bring your files up to full XHTML 1.0 compliance.

In Summary

XHTML offers stronger compliance rules with browsers than HTML, along with better tool availability for enforcing that compliance.

Common Objections to Refactoring (they're wrong - of course!)

"We don’t have the time to waste on cleaning up the code. We have to get this feature implemented now!"  Have you ever heard such a comment?  Have you ever made such a comment?  Chances are that you probably have both heard and made the comment!  Well this is probably one of the most common objections to Refactoring as it is explained in the book Refactoring HTML by Elliotte Rusty Harold and summarized in the following series of posts: The Cafes » Objections to Refactoring.

But let's start from the beginning.  What is a Refactor?  As stated on Wikipedia, in software engineering, "refactoring" a source code module often means modifying without changing its external behavior, and is sometimes informally referred to as "cleaning it up".  A refactor is a source code transformation that does not alter the behavior of the program.  Historically refactorings have been applied to make the code more maintainable and have been almost within the exclusive realm of sophisticated programmers.  In recent years IDEs like Visual Studio.NET have started to support refactoring and the term is starting to become part of the standard programmers language.  However, the concept of a refactoring has only been applied to small localized changes in the source code.  Changes that somewhat improve code  maintainability or that perform some modification in an automated manner that help the programmer increase its productivity while editing code.  Of course refactorings are a good thing, and in general if you can improve the maintainability of your code you should, specially when you know that your code has to continue to evolve.

When we were designing Aggiorno the first way of referring to what Aggiorno would do was: "Refactoring for the Web".  However, we soon realized that even if Aggiornings are similar to refectorings in the sense that they perform transformations on code they are different in their purpose and this is why we coined the name.

Aggiornings are a form of encapsulated knowledge aimed at providing business value as directly as possible.  For instance, an aggiorning will help you make your page XHTML compliant (web standards).  And we already know from previous posts that making a page standard helps you with SEO, accessibility, maintainability, etc.  Another scenario where aggiornings help is when you want to separate content from style.  The immediate business benefit is that maintainability costs go down, but also, it is much easier to port the code to mobile browsers and it is also easier for google to index your page and not get confused by cluttered markup.  Aggiornings might or might not preserve equivalence in the code they transform (refactorings always do) and even if we preserve equivalence when it makes sense to do so, we also make other changes that produce a big return on investment even if they do not preserve equivalence.  Another big difference about aggiornings is that the user can always request an explanation and aggiorno will show exactly where the changes happened so the user can ultimately decide if he wants to go trough with them or not.

Going back to the title of this post, source code improvements that are aimed at improving maintainability are in general a very good thing to do.  If you look at refactorings from the perspective of aggiornings the code improvements are even more important.  We are talking about improvements whose ROI can immediately be measured from the business perspective.  There really should be no objections to using refactoring techniques or aggiornings on your code.  If you have an objection, let me know!

ahh... and let me encourage you to download aggiorno Beta and explore aggiornings for yourself.



download aggiorno

About Aggiorno

Aggiorno - a plugin for Visual Studio - is your instant ticket to SEO friendly, XHTML compliant, CSS styled HTML and ASP.NET! Read more on What is Aggiorno?

Recent comments

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in

Subscribe to Rss Feed