Are Your Sitemaps Creating Issues You Don’t Know About (and What To Do Instead)

For long Google and other search engines have been proclaiming that submitting sitemaps are the only way to get all posts and pages on your website indexed by them.

While this was great advice in the 1990’s when search engine technology was in its infancy, it’s a whole different ballgame right now. And as online marketers, we need to keep up with the times.

Jetpack is built by Automattic. The same people who created WordPress.

Now in modern times, it’s time to get rid of all your sitemaps, especially if you are using a WordPress plugin to auto generate them. And to explain why, here’s a history lesson:

The History of Sitemaps

When search engines were new and the web was simply too big to crawl efficiently and quickly, search engines relied on 3 basic techniques to find websites and pages on the internet:

Links

When a search engine found a new website (or new content on a website), they would crawl it and discover whole “new websites” (new to the search engine) from the links they found on the first website. This was one method used by search engines to find new websites.

Sitemaps

Obviously the above method did not guarantee that search engines could rely upon already indexed websites to link to all non-indexed and new, upcoming websites to find content on the web. So they asked webmasters to submit their sitemaps so that search engines know what pages to crawl, making the crawling process more efficient for the search engines.

Sitemaps contained information about new content being published on the websites and also defined exactly how often search engines should check existing content to look for changes.

This method gave webmasters a lot of control on how they wanted crawlers to crawl their websites but it wasn’t long before search engines realized that webmasters aren’t the right people to define how often their website should be crawled, so they changed the way they handle sitemaps and took most of the control back.

Matt Cutts(Ex?) Google Web Spam Team

... with XML sitemaps we don't guarantee that we'll crawl the pages from those ...

(So even in 2009, Google didn't give full weight to XML sitemaps. Watch the full video below.)

(In 2015, Google appointed a new head of Web Spam)

Matt Cutts on Sitemaps (Oct 2009)

Today even Google says that sitemaps aren't necessary!

Pings

Earlier WordPress websites also relied on a system of pings to inform various information services across the internet whenever a new post was published. This information was also shared with search engines allowing them to find new websites without needing the webmaster to inform the search engines.

Once again, this method was crude and it wasn’t long before ping services started being abused and search engines had to stop relying on them.

Fast forward to today …

… and search engines, especially Google, have become very advanced in how they find information on the web and add it to their index. Now they find websites without being informed of their existence. And the great news for WordPress websites is that Google knows that more than 25% of the internet is powered by WordPress and has developed it’s algorithm to crawl WordPress websites efficiently.

The problem with sitemaps

First, Google can now crawl the web more efficiently than ever, meaning that it costs Google less time, effort and money which means that new content on your website should get indexed quickly.

But that’s not the case.

#Sitemaps or Fetch as #Google? Which is the best method to get your #content indexed?

Click to Tweet

Sitemaps are slow

When submitting new content to Google’s index through sitemaps, it can take days and sometimes even weeks for the content to be indexed by Google. This means that when you publish a new blog post and it’s updated in your sitemaps immediately, Google still doesn’t show it in search results for quite a while.

And when you submit an auto generated sitemap to Google (which is usually the case for WordPress websites), it includes pages you would want hidden from Google. These pages usually are of the type "Thank you for confirming your email, here’s the download you signed up for"

"What are the other types of pages which should be hidden from Google?"

share your thoughts by leaving a comment below

These auto generated sitemaps are quite easy to locate and usually exist at www.yourwebsite.com/sitemaps.xml

Sitemaps make it easy for your competitors to spy on you

The trouble with the auto generated sitemaps is that you can’t choose which pages to allow or disallow in the sitemap. This leads to all your pages, including the ones you don’t want indexed, to be posted to your sitemap for Google to index.

Fortunately, there is an easy fix for this, which is to add a noindex tag to your page which prevents Google from indexing that page.

Still, it discloses the URL or your sensitive “download your lead magnet here” pages open to anyone who is willing to visit www.yourwebsite.com/sitemaps.xml making it one more method in which people can download your lead magnets without signing up for your newsletter.

And consider this, who would really want to see what you offer up as lead magnets?

If you guessed your competitors, you guessed right!

Sure, a few of your smart visitors can also access your lead magnets without signing up, or even visiting your website, your competitors are the ones who would want to gain an advantage on you by finding out what you offer as lead magnets and coming up with creative ways to outsmart you.

Did you know, #sitemaps make it easy for your #competitors to spy on you?

Click to Tweet

By allowing auto generated sitemaps to include all your pages, you are making this process much easier for them!

And now that you know this, you can pull this trick on them before they try it out on you and realize you already have your defenses up 🙂

Hiding lead magnets requires more than just removing the download pages from your sitemaps

Possibility of Duplicate Content Penalty

The other issue with sitemaps you’ll encounter, is if you are split testing your landing pages, in which case you’ll present both versions of similar pages to Google and if you aren’t noindexing one of those pages, you’ll end up getting a duplicate content penalty.

I use SEO Ultimate plugin to easily noindex posts/pages on my WordPress website.​

SEO Ultimate is, IMHO, the best SEO plugin for WordPress. It has tons of useful options!

Sitemap generation slows down your website

Each sitemap file is also limited to 50,000 URLs and you’ll have to split your sitemap files if you wish to submit more URLs. This isn’t a big concern for small and medium businesses but might be a big concern for larger organizations.

However, if your sitemap files are quite large, they might slow down your website because sitemap generation is a resource intensive process. Depending on your sitemap plugin’s configuration, it could happen once a day of every few hours, causing disruptions for your visitors. This problem would affect shared hosting users most.

Indexing by Googlebot

What I use and recommend is submitting your website pages to Google manually using the Fetch as Google option in Google Search Console.

Here’s how to do it:

1. Login to Google Search Console and select your website

2. On the left, click Crawl to open a drop down menu. Select Fetch as Google.

3. Enter the URL you want to be indexed and click Fetch. Wait for processing to be completed.

4. Click Submit to index

5. Complete the human verification and click Crawl only this URL if you want only one URL to be indexed and Crawl this URL and its direct links if you are doing this for the first time and want more pages to be indexed with each submission.

Google will show you the limits it has placed for these services and at 500 URLs a month, they are decent enough for small and medium businesses.

6. Click Go.

Google will now crawl the URL and its direct links (if you chose that option) and add them to its index.

The Feed Method

If manual addition to the index doesn’t appeal to you and you don’t want to use sitemaps either, another option you have is to let Google find your content by submitting your website’s RSS (or Atom) feed to Google. This will surely take care of the uneasy feeling some webmasters can get when they login to Google Webmaster Console and see Google's recommendation that they should submit a sitemap 🙂

The easy method to find the RSS feed for self hosted WordPress websites is to add /feed/ to the end of your homepage's URL. The comments feed can be found by adding /comments/feed/ to the end of your homepage URL.

For example: http://example.com/feed/ and http://example.com/comments/feed/

Since RSS feeds contain your blog posts and are publicly accessible by default, you won't face any of the issues presented by sitemaps.

RSS feeds are also treated as a sitemaps and as with sitemaps Google will not add new URLs to its index quickly. The indexing delay with submitting feeds is the same as with submitting XML sitemaps. Other than that, there's no reason why you shouldn't be submitting RSS feeds to Google Webmaster Console.

Conclusion

So there you have it. Here’s a list of all the methods to ensure that your website gets indexed by Google. The best one in my opinion, is to manually Fetch as Google every time you publish a new page or blog post on your website. This is also the method that should work for most small and medium sized websites.

To know more techniques on how to make Google work for you, check out the SEO section.

If you like this post, please share it on your favourite social networks:

Cheers,

Pullkit Gera

PS: Coldplay is just too good ... in Swahili 🙂

Leave a Comment:

2 comments
Add Your Reply