An XML Sitemap is a document that assists search engines such as Google, Bing understand a websites content. It allows webmasters to add additional information such as priority, last time content was updated, and how often the page is updated.
Search engines do not have the capacity to index your entire website on each visit, typically pages with high importance are visited more than those that aren’t. If you have a website with 100,000 pages, then chances are a certain proportion of those pages are updated more often than others.
As a theoretical example to explain this – let’s assume only 1% of your content is refreshed by search engines, Without an XML sitemap, you are leaving it up to search engines to decide which pages to update in its index. So it may revisit 1000 unimportant pages on your site working top-down through your site hierarchy, not reaching important pages which are several layers deep.
With a properly implemented sitemap, you can prioritise and inform search engines which pages on your site are updated more often, utilising the theoretical 1% more.
There are a number of different tools and methods to generate an XML sitemap, we assume you already have one implemented and want to check .
You will need a tool to scrape your website for a comprehensive list of pages. Again, there are a number of tools out there, such as DeepCrawl or ScreamingFrog which can perform this for you.
If you need help crawling your website, contact us and we can run a crawl for you.
Download your XML sitemaps, this can be done by using a browser download manager such as Turbo Download Manager for Firefox. Or for the HTML savvy you can create a simply HTML page and save it locally.
For example:
Once you’ve opened the page you can download XML sitemap(s) locally.
In some cases there is a single XML file, however typically there are multiple XML sitemaps. You will need to download each XML sitemap locally and will need to import each one.
If you are using a plugin such as YoastSEO to generate your XML Sitemap, you can open up the URL in a brower and download it directly thanks to its XLST (eXtensible Stylesheet Language Transformations).
Once you have saved the document locally, fire up Excel and start a blank worksheet.
Now you have collated the data, you can use Excel’s built in tools to do some basic comparison. Firstly we’ve need to highlight duplicate values and exclude them. This will show us URLs that are in the XML sitemap, and URLs that are crawled. Those we aren’t really interested in, we’re interested in the URLS missing from the XML sitemap, or visa versa.
Important: Save your document, the next set of instructions can be processor intensive so worth saving now in case your Excel crashes or becomes unresponsive.
Now we will need to filter out the duplicate values as these exist in both the XML sitemap and Website crawl.
In order to see what remains we want to hide any values that have been highlighted in Excel as duplicates.
If you have a lot of URLs this may take a few seconds to process, so please be patient.
Now the cells remaining can be described as following
This data provides basic information about your website from a search engine crawler vs the XML sitemap (handrail). It can help identify issues with XML sitemap generation, crawlability, URL canonicalization, duplicate content.
(The sales bit) If you need any help with SEO on your website, please feel free to get in touch
We have been informed of an ongoing scam conducted through WhatsApp and other messaging platforms, falsely promising employment or payment to individuals. Please be aware that these communications are not associated with Bravr Ltd. They will attempt to direct you to a website that has a similar domain to ours with additional characters. This is a scam website and has nothing to do with us. We urge everyone to report such activities to the police and through the messaging platforms used for contact.
Please see our Fraud Prevention page for more details
Do not make any payments or disclose personal information. Official communications from our company will always come from an email address ending in @bravr.com.
Stay vigilant and safe.
Shah - Founder of Bravr Ltd.