Should you let Google index your PDFs?

Should you let Google index your PDFs?

You'll find affiliate links in the blog and you can read the Affiliate Disclosure here

Google and other search engines can easily access, index and make your online PDFs searchable for anyone who looks for them (or the keywords they contain). This is bad news if you offer your lead magnets or ebooks in PDF format because anyone can access them if they aren’t protected.

Simply put, you can lose leads and sales if you don’t know how to protect PDFs on your site.

In this article you will learn:

  • check
    How to hide your PDF files on your sites (and any other file formats like MP3, AVI, MP4 etc)
  • check
    Why you should create a Download Page for your free downloads
  • check
    What other methods you may try (and why they won’t work)

So you've put all this hard work into creating the PDF and someone can simply access it without buying anything or even giving you their email address?

But if you're like most people, you've simply uploaded the PDF and didn’t really bother hiding them. Because you figured, who can find them without knowing the exact URL to access, right?
And to be fair, you did nofollow all links leading to the download page and to the actual PDF itself, so it should be ok, right?
Nope

Why you should hide the PDFs on your website

Google indexes and displays PDF files in search results because your PDF contains keywords that Google might find useful in search results. This means that even people who aren’t really looking to download your PDF without signing up can do so.

When they do, they will bypass your email signup and will get the benefits of reading your lead magnet.

It’s even worse if the PDF is paid. This means that people are getting access to what you are selling, for free!

And it’s not even difficult to search for PDFs on the internet:

Websites like PDFsearchengine provide an easy way to search for PDF files on the web.

You can even use Google’s own advanced search. All you have to do is to enter the domain name of the website and select file type as PDF. Voila, you can find all PDF files indexed by Google. Try it right now and see how many of your PDFs are visible to anyone willing run a simple Google search.

It’s not really ethical, but you can also run such a search for your competitors PDFs to learn how to gain a competitive advantage!

(They might have read your PDFs already!)

Google's Advanced Search makes it easy

OR you can even search the web for PDF files of topics you are interested in using a query like:

filetype:pdf how to get a student pilot licence

These PDFs aren't for sale but it's ridiculously easy to find paid content and lead magnets too

Download this post as, guess what?
A PDF file!

What if the PDFs aren’t used for lead generation and aren’t a paid product?

Should I still hide them?

Short answer: Yes.

I understand that sometimes you’ll want to give away a PDF file to your website visitors without requiring anything from them.

In that case too you should hide your PDFs because they are likely to contain more or less the same information which you have on other parts of your website. If you don’t, you can incur a duplicate content penalty from Google for the content contained in your PDFs.

And you know it’s not easy to go through all your uploaded PDFs and look for duplicate content.

Did you know you can get a duplicate #content penalty from content in your PDF? #Google #SEO

Click to Tweet

Why Use Download Pages

When someone signs up to your newsletter, you ask them to go to their inbox and click the link in your email. Right?

The best thing to do here is to give them a link which gets them back to the download page on your site where they can download the PDF.

Why you ask?

Because it allows you to setup a download page where (apart from providing the PDF download link) you can do some pretty cool stuff like:

  • check
    Display related posts to keep your new subscriber engaged with your content. This keeps the visitor engaged and they browse more of your website.
  • check
    Include social sharing buttons so they can share your lead magnet with others. Make sure you change the URL being shared so people don’t reach your download page directly. Most social media sharing plugins don’t allow this so I use the social sharing buttons in Thrive Architect which allow me to set any URL I like.
  • check
    Offer a subscriber discount triggered by a visit to the download page (you can even set the discount to end X days after the signup and also display a countdown timer using Thrive Ultimatum). You can also Up-Sell, Down-Sell and Cross-Sell related products.
  • check
    Place a cookie on the subscriber’s device which allows you to show them different opt in forms. I use Smartlinks to do this.
  • check
    Split test you lead magnet funnels to see how many people who signup end up visiting your download page (use this data to optimize your funnels).
  • check
    Advanced bloggers can also use tracking scripts on the download page. Example: Add/Remove someone from Facebook Custom Audience based on whether they visited your download page (subscriber email has been confirmed).

And if you don't have a download page setup for each of your lead magnets, you lose this opportunity.

PSST: Download pages on this site are customized too 🙂

How (not) to hide?

Let's take a look at some solutions suggested elsewhere on the internet and see if they work:

1

Send as attachment

You setup your lead magnet or purchased PDF to be delivered through your email marketing service like ActiveCampaign or GetResponse. This way you completely avoid hosting the file on your server.

It seems like a good idea if you don't want to get the visitor back your site after they sign up

The problems with this approach are:

  • Email marketing services have file size limits so if you have a large PDF document, it can’t be delivered as an attachment.
  • There are a number of things you can do on your download page all of which can’t be done anymore if the subscriber can simply download the attachment.
2

Host PDFs on third party services

You could host your PDFs on a third party service like Dropbox and simply provide the link in your email or on the lead magnet download page.

And no, that doesn’t work either. Unless you password protect your lead magnets or the folders they are in, Google will still index them. And they will probably benefit from the high PageRank of services like Dropbox and feature more prominently in the search results.

Hosting lead magnets on Dropbox isn't a good idea either!

That would be more harmful than just leaving the PDFs on your server. Don’t do it!

So now we’ve established that offsite storage of your PDF files is not an option.

3

Password protect PDFs on your website

This will remove the possibility for Google to index your PDF files. Of course, it already has indexed the PDF files on your website before you apply this procedure and there’s no telling when Google will update their index to reflect the changes you made. It could takes weeks or even months.

This also means that everyone who downloads the newly protected PDF file will encounter a password and if they are unable to access the file will leave a bad impression of you and your website. They might never return. I’ll explain this in a bit …

See, when I signup on a mailing list to download a lead magnet, I pretty much know what to expect.

  1. I will signup on the list and automatically be redirected to a page which contains content like: “Thank you for signing up, now check your inbox for the email which contains the link to the download file.”
  2. I check my inbox and find the email. Click the link in it I’m sent to a page from where I can download the lead magnet.

I’m sure you must have been through this process so often that you don’t even think about the steps anymore. You just do it automatically. I don’t even read the instructions anymore because they are so standard.

Nobody expects to see this!

Guess what, it’s the same for your website visitors. They don’t expect the lead magnets to be password protected. So even if you provide them with the password, they are likely not to find it.

The whole password issue can be dealt with by providing the password in big bold fonts on your download page and in your email. Great, right?

No wait, there’s another issue!

The whole point of downloading a PDF is that I can access it later. So 2 months later when I open your PDF lying on my desktop, what are the chances I will remember the password?

You see how password protecting your PDF lead magnets is not a good idea?

Password protecting your lead magnets is a bad idea! #Blogging #LeadGeneration

Click to Tweet
4

Block access through robots.txt

If you already use robots.txt then you might think it’s a good idea to use it to block access to your PDF file using the code:

User-agent: *
Disallow: *.pdf # Block pdf files. Non-standard but works for major search engines.
Disallow: /pdfs/ # Block the /pdfs/directory if you keep all your PDFs in one folder

The problem is that this only prevents Google from accessing your PDF files but does nothing to remove them from its index or being listed in search results.

While looking for resources to help me create this post, I came across many websites which claimed this to be the perfect solution. Few are even listed in top 5 results of Google. Obviously they didn’t do their research and everyone who implemented their advice now thinks that their PDF files are being protected from Google search.

In a nutshell, this will only work for new PFDs that you upload and won’t do anything to remove the old PDFs from any search engine’s database.

Given that this solution is being touted as the best, all I want to say is to the tech & marketing bloggers is that they should do their research before posting such stuff. They aren’t just relying on wrong info themselves but also misleading anyone who reads that crappy advice.

Anyways, rant over, let’ move on …

5

Nofollow and noindex tags on your website

Noindex the “Thank You for Downloading” (basically the lead magnet download button page) and nofollow all the links to that page. But that doesn’t noindex your PDF file and Google still might be able to index it.

So that doesn’t work either.

If only there was a way where you didn’t have to make too much of an effort and this problem could go away in just a couple of steps?

Fortunately there is 🙂

Here’s what you need to do:

Download the FULL, unlocked post

[sociallocker id="795"]

Noindex the PDF files themselves.

This means that Google will get the instruction to not index and PDF files on your website and also to drop them from its index soon after finding the files have been noindexed. You should expect Google to do this fairly quickly because Google takes noindex tags quite seriously.

To do this you need to modify the .htaccess file on your server. Don’t worry if it sounds too technical, I’m going to walk you through it right now:

Step 1: Login to your website using cPanel.

The username and password here are the ones for your cPanel. If you are not sure what these are, ask your hosting company. If your host is not using cPanel, the rest of these steps don’t apply to you (and you should probably chose a host which provides cPanel which is a pretty good website management backed system.)

Simply login via cpanel

If you need help choosing a good hosting, I recommend SiteGround.

Anyways, if you aren’t using cPanel you should probably copy the URL of this page from your browser’s address bar and send it to your hosting company asking tech support to do the steps for you.

Moving on …

Step 2: Navigate to the public_html folder and locate the .htaccess file

In your cPanel, locate File Manager and click on it.

Make sure you are in public_html folder

Select .htaccess from the file list, click on edit and add the following code to the top or bottom of your .htaccess file

<Files ~ ".pdf$">
  Header set X-Robots-Tag "noindex, nofollow"
</Files>

Click Save

If you don't have a .htaccess file in the public_html folder, create it. If you have difficulty doing this, ask for help in the comments below.

To block other file types: You can repeat this code as many times as you like and all you need to do is replace the .pdf part with another file type.

Blocking .mp4 files as an example:

<Files ~ ".mp4$">
  Header set X-Robots-Tag "noindex, nofollow"
</Files>

Some hosts, like SiteGround, hide .htaccess from being viewed or edited in the method I just described. It's usually done for security reasons. If that's the case, just copy the code above and email the hosting support team to add it to your .htaccess file.

Step 3: Logout from cPanel

[/sociallocker]

Take a deep breath and relax. It's done.

There’s no need to inform Google or any other search engines of the change. They will catch on fairly quickly.

Of course, this still means that your PDFs can be downloaded without signing up for your email list if someone knows the exact URL of the PDF file or the “Thank You for Downloading” page. For this reason I always recommend you noindex the “Thank You for Downloading” page. This will prevent the download page from being indexed by Google.

But if you use a plugin which auto generates the sitemaps for pages on your WordPress website, your download pages can still be found by anyone willing to visit yourwebsite.com/sitemap.xml and pulling up list of ALL your download pages.

This can be prevented by using SEO Ultimate Plus which allows you to manually remove pages from your auto-generated sitemaps.

Regardless, sitemaps are an outdated method for getting Google to index your site and I use a better method 🙂

Why sitemaps are useless and I don't use them anymore

Other than that, your PDFs can still be downloaded if someone shares the direct links to the PDFs or the download link pages.

And the possibility of that happening is quite slim and the recovery is fairly easy. You’ll just have to change the URLs of your download link pages and also of the file names of your PDFs.

But again, this would be a rare case. If you have been affected by someone sharing links in this manner, I’d like to help you out. Let me know by leaving a comment below.

So there’s you have it. This is how you protect your PDF files from showing up in Google’s index and why you should not use auto generated sitemaps on your website.

Cheers,

Pullkit Gera

  • Filippo Malvezzi says:

    Oh now THIS is useful!
    And pretty clear!

    Good job!

  • >
    348 Shares