Sometimes you end up with pages on your site that’s accessible from several different URL’s. It might not be a big deal, but from an SEO perspective it’s bad. Each page should have only one URL, otherwise Google and other search engines may treat your pages as duplicate content, resulting in a lower page rank etc etc.
In ASP.NET, URL’s are mostly not case sensitive, but a URL is actually case sensitive (see w3.org). This means you may accidentally refer to a page using different casing and thereby have duplicate content on your site.
In addition, a feature of Sitecore, is that you can have different formats on your URL’s. You may for example include or omit a language indicator in the URL.For example, /mypage and /en/mypage are usually referring to the same page (depending on your setup of course).
Another situation, that will result in the same problem, is when using Aliases in Sitecore. The whole purpose of the function is to add a second address to a page, right. Good in many ways, but not from an SEO perspective.
Fortunately, there is a simple way to solve this. Use the canonical meta tag. This tag should be used whenever a page is rendered using any other URL than it’s primary url:
<link rel="canonical" href="http://full.url/to/my/page" />
We can easily render this tag from code behind in our .master page or equivalent. Below is a simple method to accomplish this:
public static string GetCanonicalUrl() { var request = HttpContext.Current.Request; var uri = new Uri(request.Url, request.RawUrl); string url = uri.AbsolutePath; string baseUrl = uri.GetComponents(UriComponents.Scheme | UriComponents.HostAndPort, UriFormat.Unescaped); if (Sitecore.Configuration.Settings.AliasesActive && Sitecore.Context.Database.Aliases.Exists(url)) { Sitecore.Data.Items.Item targetItem = Sitecore.Context.Database.GetItem(Sitecore.Context.Database.Aliases.GetTargetID(url)); return string.Format("{0}{1}", baseUrl, Sitecore.Links.LinkManager.GetItemUrl(targetItem)); } string itemUrl = Sitecore.Links.LinkManager.GetItemUrl(Sitecore.Context.Item); if (url != itemUrl) { return string.Format("{0}{1}", baseUrl, itemUrl); } return null; }
This code isn’t perfect though. My experience with multi language sites, is that you usually want’s to enforce the language parameter in the URL. This is usually fine for most pages, but the domain start page shouldn’t really be on “/en/”, but rather on “/” of course. So there’s a need for a little work around for that page, so that the “/en/”-version canonicals to “/” instead of the other way around.
For purposes of a canonical URL, isn’t it sufficient to simply look up the URL of the context item, set URL options to include language (if the site is multi-lingual), and output the tag with the resulting URL? As I understand it, the canonical tag should be placed even when the current URL is the canonical URL. Canonical URLs can be root-relative as well.
Yes, I think you’re right! I actually had some problem with this a couple of years ago. Now when looking at RFC 6596, I also think it should be ok to just write the full link on all requests. That’ll make the code much simpler. Thanks!
You can now download the module source code or a binary package from http://marketplace.sitecore.net/Modules/DeadUrls.aspx .
Workaround:To fix this issue, replace “./Website/sitecore/shell/Controls/Sitecore.js” with this file .
Workaround:To fix this issue, replace “./Website/sitecore/shell/Controls/Sitecore.js” with this file .
For example, you have two languages setup in sitecore, English (en) and German (de-DE), the filename configured in the /App_Config/Include/SitemapXML.config for your website is “Sitemap.xml”. After publishing, you’ll find two sitemap files, the “Sitemap.xml” which contains the links to the english version and the “Sitemap_de-DE.xml” which contain the german urls.
URL Canonicalization seems to be quite new topic to me, being a newbie. Although, I have read some guides, but still i have one confusion, Do I need to add URL Canonical Tag in every page of my site?
No, you don’t have to add the canonical tag, but it can help SEO. If the page is accessible through multiple URLs, you should have add the tag. I’d say there are no downsides in adding it, except a few extra bytes of download.
How would you go about fixing the core issue, rather than the rel=canonical workaround?
i.e. set up 301 redirects to ensure only one URL exists for each item?
Well, I’d gess you could simply add a processor to the httpBeginRequest pipeline that compares the inbound URL with the expected canonical URL and perform a redirect. However, one have to ensure this doesn’t interfere with the CMS editing functions, POST actions etc.
But to be honest, I haven’t really seen a use of that, since all properly CMS-generated inbound links to the page will be correct.
Thanks a lot!
It’s just fine that you shared this solution with us, thank you for your cooperation.