When building a website or application using a CMS or framework you often get the chance to set a suffix ( like .html ) for your clean URLs. Most people tend not to change the default setting but this might get you into trouble when it comes to the scalability of your website or application.
When you’re not using suffixes and you have a URL that looks something like http://www.domain.com/about/history you should always check if you have content at http://www.domain.com/about because Google tends to spider on a directory level and to look one level higher in the hierarchy. I can’t find any hard figures or facts on this except for the crawler logs of the domains I manage. You should also note that if you serve the same content on those two URLs you really should include a rel=”canonical” in there because you don’t want to offer duplicate content and lose ‘grip’ on which URL search engines include.
My personal preference is not to use suffixes because I like my URLs as clean as they get. But when I’m working on an application I often have to reconsider this. Here’s why. Modern web applications often need to serve data in more than one format. For instance, if I was to build an application in which you can manage your grocery lists I might also want to offer the data in XML or JSON so you could easily access the data beyond the boundaries of your browser. In those cases http://www.domain.com/groceries/list/1.html and http://www.domain.com/groceries/list/1.json clearly state the output intent of your application and you can easily route this as /controller/action/id.format!