It’s not uncommon to find that multiple URLs lead to the same page on a website. For instance, the www and the non-www version of the website. However, not all these links are created equal, and some may be considered “non-canonical” (duplicates). To avoid confusion for search engine crawlers, website owners should specify which URL is canonical (most important) and should be crawled and indexed.
💡 Recommended Reading: Canonical URL: What it is and How to use it
What is a Non-Canonical URL in sitemap?
A sitemap should list all those pages of a website that you want the search engines to crawl and index. Suppose you include non-canonical URLs in your sitemap. In that case, you are essentially sending misleading signals to the search engines, instructing them to index a URL that declares another URL as the canonical version.
Why is it important?
If you send misleading signals to the search engines, they might ignore your sitemaps leading to indexability issues for your website. So for correct indexation, ensure that you only include Canonical URLs in the sitemap.
How to check for non-canonical URLs in sitemap?
To identify non-canonical URLs in the sitemap, you can either use paid tools like Ahrefs and Semrush or use the free Non-Canonical Pages in Sitemap Checker google sheet that I created.
Here’s how you can use it:
Step 1: Make a copy of the google sheets
Open the Non-Canonical Pages in Sitemap Checker and click on create a copy.
Step 2: Put your Sitemap and Click on Extract URLs
Put your sitemap in cell B2. Ensure that you’re not putting the sitemap index. Once you’ve done that click on Extract URLs under the Canonical Links in Sitemap option. You might be asked to Authorize the App Script for the first time.
Once you authorize, just run the tool again.
Step 3: Identify the non-canonical URLs in sitemap
Any Non-Canonical URLs in sitemap will show a red cell in the Match Column.
Once you have identified the non-canonical URLs in the sitemap, you can either:
- Replace the non-canonical URLs with canonical URLs in the sitemap, or
- Change the canonical version of the URL to the one submitted in the sitemap.