Should you let Google index your PDFs?

Should you let Google index your PDFs?

PDF file lead magnets are a great way to get more leads and subscribers for your email list. PDF files are also great for selling books and other content on your blog.

Most people however just upload the PDFs to their server and don’t really bother hiding them from the prying eyes of Google and other search engines.

In this article you’ll find out why you should hide the PDFs on your website and how to do it.

Why you should hide the PDFs on your website?

Google indexes and displays PDF files in search results because your PDF contains keywords that Google might find useful in search results. This means that even people who aren’t really looking to download your PDF without signing up can do so.

When they do, they will bypass your email signup and will get the benefits of reading your lead magnet.

It’s even worse if the PDF is paid. This means that people are getting access to what you are selling, for free!

And it’s not even difficult to search for PDFs on the internet:

Websites like PDFsearchengine provide an easy way to search for PDF files on the web.

You can even use Google’s own advanced search. All you have to do is to enter the domain name of the website and select file type as PDF. Voila, you can find all PDF files indexed by Google. Try it right now and see how many of your PDFs are visible to anyone willing run a simple Google search.

It’s not really ethical, but you can also run such a search for your competitors PDFs to learn how to gain a competitive advantage!

(Be a good sport and delete them once you have read them)

Google's Advanced Search makes it easy

OR you can even search the web for PDF files of topics you are interested in using a query like:

filetype:pdf how to get a student pilot licence

These PDFs aren't for sale but it's ridiculously easy to find paid content and lead magnets too

Download the FULL post as, guess what?
A PDF file!

What if the PDFs aren’t used for lead generation and aren’t a paid product?

Should I still hide them?

Short answer: Yes.

I understand that sometimes you’ll want to give away a PDF file to your website visitors without requiring anything from them.

In that case too you should hide your PDFs because they are likely to contain more or less the same information which you have on other parts of your website. If you don’t, you can incur a duplicate content penalty from Google for the content contained in your PDFs.

And you know it’s not easy to go through all your uploaded PDFs and look for duplicate content.

Did you know you can get a duplicate #content penalty from content in your #PDF?

Click to Tweet

How (not) to hide?

Let's take a look at some solutions suggested elsewhere on the internet and see if they work:


Send as attachment

You setup your lead magnet or purchased PDF to be delivered through your email marketing service like ActiveCampaign or GetResponse. This way you completely avoid hosting the file on your server.

The problems with this approach are:

  • Email marketing services have file size limits so if you have a large PDF document, it can’t be delivered as an attachment.
  • If the file size is not an issue, the subscriber still doesn’t reach your “click here to download the file” page where you have the option of upselling a product, engaging the subscriber by listing your most popular related blog posts on the download page or getting more social media shares or followers to your social media accounts. In fact there are a number of things you can do on your download page all of which are rendered ineffective if the subscriber can just download the attachment.
  • If you are using the SmartLinks feature in Thrive Leads (read review), you don’t get an opportunity to cookie the subscriber and have your website intelligently show them different content from what non-subscribers see.

Host PDFs on third party services

You could host your PDFs on a third party service like Dropbox and simply provide the link in your email or on the lead magnet download page.

And no, that doesn’t work either. Unless you password protect your lead magnets or the folders they are in, Google will still index them. And they will probably benefit from the high PageRank of services like Dropbox and feature more prominently in the search results.

Hosting lead magnets on Dropbox isn't a good idea either!

That would be more harmful than just leaving the PDFs on your server. Don’t do it!

So now we’ve established that offsite storage of your PDF files is not an option.


Password protect PDFs on your website

This will remove the possibility for Google to index your PDF files. Of course, it already has indexed the PDF files on your website before you apply this procedure and there’s no telling when Google will update their index to reflect the changes you made. It could takes weeks or even months.

This also means that everyone who downloads the newly protected PDF file will encounter a password and if they are unable to access the file will leave a bad impression of you and your website. They might never return. I’ll explain this in a bit …

See, when I signup on a mailing list to download a lead magnet, I pretty much know what to expect.

  1. I will signup on the list and automatically be redirected to a page which contains content like: “Thank you for signing up, now check your inbox for the email which contains the link to the download file.”
  2. I check my inbox and find the email. Click the link in it I’m sent to a page from where I can download the lead magnet.

I’m sure you must have been through this process so often that you don’t even think about the steps anymore. You just do it automatically. I don’t even read the instructions anymore because they are so standard.

Nobody expects to see this!

Guess what, it’s the same for your website visitors. They don’t expect the lead magnets to be password protected. So even if you provide them with the password, they are likely not to find it.

The whole password issue can be dealt with by providing the password in big bold fonts on your download page and in your email. Great, right?

No wait, there’s another issue!

The whole point of downloading a PDF is that I can access it later. So 2 months later when I open your PDF lying on my desktop, what are the chances I will remember the password?

You see how password protecting your PDF lead magnets is not a good idea?

#Password protecting your #lead magnets is a bad idea!

Click to Tweet

Block access through robots.txt

If you already use robots.txt then you might think it’s a good idea to use it to block access to your PDF file using the code:

User-agent: *
Disallow: *.pdf # Block pdf files. Non-standard but works for major search engines.
Disallow: /pdfs/ # Block the /pdfs/directory if you keep all your PDFs in one folder

The problem is that this only prevents Google from accessing your PDF files but does nothing to remove them from its index or being listed in search results.

While looking for resources to help me create this post, I came across many websites which claimed this to be the perfect solution. Few are even listed in top 5 results of Google. Obviously they didn’t do their research and everyone who implemented their advice now thinks that their PDF files are being protected from Google search.

"Blog responsibly. If you don’t know what you are writing about, then don’t write about it!" is what I’d like to say to all these “solution providers”. They have given a false sense of security to their loyal fans.

Do your #research before you #blog. Avoid misleading your #followers!

Click to Tweet

If you have ever been mislead by a blogger, I wanna hear about it and if possible, offer a solution. Let me know by leaving a comment below.

Anyways, rant over, let’ move on …


Nofollow and noindex tags on your website

Noindex the “Thank You for Downloading” (basically the lead magnet download button page) and nofollow all the links to that page. But that doesn’t noindex your PDF file and Google still might be able to index it.

So that doesn’t work either.

If only there was a way where you didn’t have to make too much of an effort and this problem could go away in just a couple of steps?

Fortunately there is 🙂

Here’s what you need to do:

Download the FULL, unlocked post

Noindex the PDF files themselves.

This means that Google will get the instruction to not index and PDF files on your website and also to drop them from its index soon after finding the files have been noindexed. You should expect Google to do this fairly quickly because Google takes noindex tags quite seriously.

To do this you need to modify the .htaccess file on your server. Don’t worry if it sounds too technical, I’m going to walk you through it right now:

Step 1: Login to your website using cPanel.

The username and password here are the ones for your cPanel. If you are not sure what these are, ask your hosting company. If your host is not using cPanel, the rest of these steps don’t apply to you (and you should probably chose a host which provides cPanel which is a pretty good website management backed system.)

Simply login via cpanel

If you need help choosing a good hosting, I recommend SiteGround.

Anyways, if you aren’t using cPanel you should probably copy the URL of this page from your browser’s address bar and send it to your hosting company asking tech support to do the steps for you.

Moving on …

Step 2: Navigate to the public_html folder and locate the .htaccess file

In your cPanel, locate File Manager and click on it.

Make sure you are in public_html folder

Select .htaccess from the file list, click on edit and add the following code to the top or bottom of your .htaccess file

<Files ~ ".pdf$">
  Header set X-Robots-Tag "noindex, nofollow"

Click Save

If you don't have a .htaccess file in the public_html folder, create it. If you have difficulty doing this, ask for help in the comments below.

Some hosts, like SiteGround, hide .htaccess from being viewed or edited in the method I just described. It's usually done for security reasons. If that's the case, just copy the code above and email the hosting support team to add it to your .htaccess file.

Step 3: Logout from cPanel

Take a deep breath and relax. It's done.

There’s no need to inform Google or any other search engines of the change. They will catch on fairly quickly.

Of course, this still means that your PDFs can be downloaded without signing up for your email list if someone knows the exact URL of the PDF file or the “Thank You for Downloading” page. For this reason I always recommend you noindex the “Thank You for Downloading” page. This will prevent the download page from being indexed by Google.

But if you use a plugin which auto generates the sitemaps for pages on your WordPress website, your download pages can still be found by anyone willing to visit and pulling up list of ALL your download pages.

Read why sitemaps are useless and I don't use them anymore

Other than that, your PDFs can still be downloaded if someone shares the direct links to the PDFs or the download link pages.

And the possibility of that happening is quite slim and the recovery is fairly easy. You’ll just have to change the URLs of your download link pages and also of the file names of your PDFs.

But again, this would be a rare case. If you have been affected by someone sharing links in this manner, I’d like to help you out. Let me know by leaving a comment below.

So there’s you have it. This is how you protect your PDF files from showing up in Google’s index and why you should not use auto generated sitemaps on your website. To know my solution for sitemaps, check out this post.


Pullkit Gera

PS: This is an AWESOME movie!

  • Filippo Malvezzi says:

    Oh now THIS is useful!
    And pretty clear!

    Good job!

  • >