Get a Chance at Google with Aggiorno

As web users and web developers we are constantly attributing human qualities to the different actors of the Internet.  In my mind Google has always been the sexy, out of reach girl that we're constantly trying to impress.  When courting a girl there is a well defined protocol that needs to be respected.  You have to be polite, tactful, respectful (... it seems I am listening to my mom...), in any case there are rules that need be followed when your are trying to impress a girl and these rules go well beyond appearances.

As web developers we tend to forget some of the rules that need to be followed to make our sites more findable, more accessible, more secure, more maintainable... we typically only care about how do our pages look like in the common browsers without paying to much attention to the inner details.  You can have great content but if your markup sucks you will have issues when trying to conquer important actors like Google.

Last week I wrote a post on how the lack of use of web standards can affect your SEO efforts.  Lot's of small details that can really turn Google off.

At Aggiorno, the team is on a death march towards the release of V1.0 (soon... very soon...) and we need to relief some stress and at the same time try to educate more about the importance of good markup, the importance of following web standards on our daily work.  We came up with a video called "Get a Chance at Google" that enacts an encounter between a very content intensive web site with ... some issues...

Take a look at the video and share it if you like it.  Also, let us know what you think and if you have more ideas so this can become its own series!

Enjoy!

Web Standards and Search Engine Optimization (SEO) -- Does Google care about the quality of your markup?

There are many discussions on the web regarding the merits of using web standards. 

The argument against using web standards can be summarized as: "who cares?!"  If the browsers render my code correctly then I am accomplishing my goal.

The arguments in favor of using web standards can be summarized as providing improved cross browser compatibility and  improved maintenance cost from using cleaner code.

I believe there is an important point that is only tangentially discussed and that should be addressed much more emphatically.  What does Google do when it encounters non standard compliant HTML?  Does it affect your search results?  We need to constantly remember that search engine bots are "users" of our sites and they are not necessarily as tolerant with our markup as normal browsers.  A SEO expert gave me the best trick to understand what Google sees and what it doesn't see while going through a page.  It goes like this:

  • Open the page you want to evaluate in your favorite browser
  • Click Select All
  • Click Copy
  • Open NotePad
  • Click Paste

Whatever gets printed in NotePad is what Google is indexing.

But lets go into some examples of how the wrong HTML can affect the results of your search.

Missing alternate descriptions

First example of how following standards can improve your interaction with Google.  Google is "blind".  Google only sees the text that is embedded in your page; no images, no java script, no animations... One of the rules that are required by standards is to provide an alternate description for non textual information (alt attribute in XHTML).  If you do not include an alternate description you are missing an opportunity to provide information to Google.  The use of ALT descriptions is a best practice enforced by using web standards that affect search results!

Wrong / Missing DOCTYPE

The second example relates to the use of DOCTYPE.  The DOCTYPE is used to specify to the browser what kind of markup to expect.  Is it HTML?  Is it XHTML? No DOCTYPE? (Google replies: well... let's guess).   And the last thing you need is for Google (or any browser for that matter) to guess how to interpret your source code.  Chris Maunder from The Code Project has an excellent example of how Google can get confused if you specify a certain DOCTYPE and then you write code following a different standard.  In certain cases Google simply stops indexing the page and it assumes it was a 404 Page Not Found error.  The example that Chris shows reflects how a simple miss-closed tag (ultimately a missing "/") can avoid the indexing of a page.    Syntax correctness, which is enforced by using web standards, is important when you try to have Google index your page! UPDATE: In general this is an example of the Tag Soup problem.  The right thing to do is to make sure your web site validates according to a standard like XHTML transitional.

Lack of / Incorrect use of Entities

Ahh... entities...  Isn't it painful having to follow the rule set by web standards and escape every special character?  Well, it might be painful but Google reacts to non escaped characters in very peculiars ways.  Let's first look into the most obvious one.  If you write in a foreign language that requires characters with an accent like for example Spanish, French or Italian depending on how you code your information with entities the search results may vary

Second, there are issues with escaped vs unescaped characters in URLs.  This webmasterworld article is an example of how wrong usage of entities can cause confusion.

Third, when you use scripting to generate markup, the way in which you write your script can also confuse Google as Chris Maunder also explains in his article.  If you try to generate code without escaping the right characters you can get in trouble.  Web Standards enforce the proper use of entities, another reason to follow them to avoid search engine confusion.

Missing required page elements

There are a number of attributes in a page that are either required or recommended by web standards that can definitely increase or decrease your page rank.  One of the suggestions that many SEO experts have is to make sure a page contains at least the following attributes:

h1: Every page should have one and only one h1.   This tag should be used to express the main idea described in the page.  In general heading tags should not be used only for styling but to semantically mark the content in the page.  Google pays special attention to h1 content when indexing.

title: Every page should have one and only one title.  The title should be related to h1.  Google looks at the relationship between h1 and title when indexing.

meta tags: Every page should have a number of meta attributes (description, keywords, etc.).  These keywords are taken into account by Google while indexing and they also provide semantic information about the page that when properly used can improve the user experience while surfing the web.

Again, web standards remind you of the proper usage of these attributes and therefore can help you improve your search results.

Separation between Content and Style

Web standards teach you about separation between content and style which is an incredibly useful practice per se with regards to improving maintainability.  It also clearly has some advantages with respect to Google behavior.  The first one is bandwidth savings.  If your styling information is in a separate css file, since Google does not care about style, then it will now crawl it and therefore you will not be spending bandwidth in this manner.  But in addition to bandwidth savings (which can be major for high trafficked sites), there is a limit to the size of a page that is indexed by search engines.  So, if your page is not "polluted" by styling then it can have more content!   Additionally, if your style contains syntax errors it can confuse Google and this is a way to avoid it.   UPDATE: A very good practice is to avoid HTML tables as a mechanism to layout the information on a table.  This should be done using style markup (CSS).

Web standards practices help you direct your efforts with respect to this separation.

Unmarked text: no semantics

Many times web developers simply compy and paste text into a web page.  The resulting markup is basically just text separated with BRs.  As of today I do not believe search engines penalize this behavior, but moving forward it will be more and more important to make sure every piece of text contains as much semantic as pssible.  For now, the minimum semantic that a piece of text should contain is basic HTML markup like P, UL, Hx, etc.  This information can help search engines understand the priority and context of the content.  Additionally, unmarked txt is very hard to style and maintain therefore it is a good practice anyways.  UPDATE: There are some newer standards like microformats that can add semantic information to a page without affecting the rendering of the information.  Even if at this moment it is not clear how microformats will affect search results presumably they will be important in the near future.

Conclusions

It is clear from the examples above that not following web standards can have a huge impact on your search results!  From not providing the best information to index a page to Google not indexing a page at all because of syntax errors in the markup, even if the page looks good in the browsers!  UPDATE: Aarron Walter just published a very good findability strategy checklist that has a complete section on markup and additional sections on server and client side code.

It is true that you can avoid most of the mistakes shown here without the need to completely follow web standards, but they are super useful as a guideline and as best practices to follow when programming web pages.  Next time you look at your page  you can have Aggiorno by your side helping you with all the time-consuming tasks necessary to make a page XHTML compliant.

With Aggiorno we are promoting web standards by eliminating a lot of the tedious work that is required to make a page validate.  By doing so we are helping pages improve their stance towards search engines.  In particular:

Aggiorno can help you find missing alternate descriptions  

Aggiorno can help you make your code structure XHTML compliant

Aggiorno can help you convert special characters into appropriate entities

Aggiorno can help you with content-style separation

Aggiorno can help you with text semantication

 

Aggiorno Release Candidate 0 (RC0) is now available

We just uploaded Aggiorno RC0 to our site. 

We are getting closer every day to a final polished release that will make us proud.

This version fixes a number of small imperfections.  You feedback is invaluable, please give Aggiorno a try and let us know what you think.

 

 

Tags:



download aggiorno

About Aggiorno

Aggiorno - a plugin for Visual Studio - is your instant ticket to SEO friendly, XHTML compliant, CSS styled HTML and ASP.NET! Read more on What is Aggiorno?

Recent comments

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in

Subscribe to Rss Feed