PDF file lead magnets are a great way to get more leads and subscribers for your email list. PDF files are also great for selling books and other content on your blog.
Most people however just upload the PDFs to their server and don’t really bother hiding them from the prying eyes of Google and other search engines.
In this article you’ll find out why you should hide the PDFs on your website and how to do it.
Google indexes and displays PDF files in search results because your PDF contains keywords that Google might find useful in search results. This means that even people who aren’t really looking to download your PDF without signing up can do so.
When they do, they will bypass your email signup and will get the benefits of reading your lead magnet.
It’s even worse if the PDF is paid. This means that people are getting access to what you are selling, for free!
And it’s not even difficult to search for PDFs on the internet:
Websites like PDFsearchengine provide an easy way to search for PDF files on the web.
You can even use Google’s own advanced search. All you have to do is to enter the domain name of the website and select file type as PDF. Voila, you can find all PDF files indexed by Google. Try it right now and see how many of your PDFs are visible to anyone willing run a simple Google search.
It’s not really ethical, but you can also run such a search for your competitors PDFs to learn how to gain a competitive advantage!
(Be a good sport and delete them once you have read them)
Google's Advanced Search makes it easy
OR you can even search the web for PDF files of topics you are interested in using a query like:
filetype:pdf how to get a student pilot licence
These PDFs aren't for sale but it's ridiculously easy to find paid content and lead magnets too
Download the FULL post as, guess what?
A PDF file!
Should I still hide them?
Short answer: Yes.
I understand that sometimes you’ll want to give away a PDF file to your website visitors without requiring anything from them.
In that case too you should hide your PDFs because they are likely to contain more or less the same information which you have on other parts of your website. If you don’t, you can incur a duplicate content penalty from Google for the content contained in your PDFs.
And you know it’s not easy to go through all your uploaded PDFs and look for duplicate content.
Did you know you can get a duplicate #content penalty from content in your #PDF?
Let's take a look at some solutions suggested elsewhere on the internet and see if they work:
The problems with this approach are:
You could host your PDFs on a third party service like Dropbox and simply provide the link in your email or on the lead magnet download page.
And no, that doesn’t work either. Unless you password protect your lead magnets or the folders they are in, Google will still index them. And they will probably benefit from the high PageRank of services like Dropbox and feature more prominently in the search results.
Hosting lead magnets on Dropbox isn't a good idea either!
That would be more harmful than just leaving the PDFs on your server. Don’t do it!
So now we’ve established that offsite storage of your PDF files is not an option.
This will remove the possibility for Google to index your PDF files. Of course, it already has indexed the PDF files on your website before you apply this procedure and there’s no telling when Google will update their index to reflect the changes you made. It could takes weeks or even months.
This also means that everyone who downloads the newly protected PDF file will encounter a password and if they are unable to access the file will leave a bad impression of you and your website. They might never return. I’ll explain this in a bit …
See, when I signup on a mailing list to download a lead magnet, I pretty much know what to expect.
I’m sure you must have been through this process so often that you don’t even think about the steps anymore. You just do it automatically. I don’t even read the instructions anymore because they are so standard.
Nobody expects to see this!
Guess what, it’s the same for your website visitors. They don’t expect the lead magnets to be password protected. So even if you provide them with the password, they are likely not to find it.
The whole password issue can be dealt with by providing the password in big bold fonts on your download page and in your email. Great, right?
No wait, there’s another issue!
The whole point of downloading a PDF is that I can access it later. So 2 months later when I open your PDF lying on my desktop, what are the chances I will remember the password?
You see how password protecting your PDF lead magnets is not a good idea?
#Password protecting your #lead magnets is a bad idea!
If you already use robots.txt then you might think it’s a good idea to use it to block access to your PDF file using the code:
Disallow: *.pdf # Block pdf files. Non-standard but works for major search engines.
Disallow: /pdfs/ # Block the /pdfs/directory if you keep all your PDFs in one folder
The problem is that this only prevents Google from accessing your PDF files but does nothing to remove them from its index or being listed in search results.
While looking for resources to help me create this post, I came across many websites which claimed this to be the perfect solution. Few are even listed in top 5 results of Google. Obviously they didn’t do their research and everyone who implemented their advice now thinks that their PDF files are being protected from Google search.
"Blog responsibly. If you don’t know what you are writing about, then don’t write about it!" is what I’d like to say to all these “solution providers”. They have given a false sense of security to their loyal fans.
Do your #research before you #blog. Avoid misleading your #followers!
If you have ever been mislead by a blogger, I wanna hear about it and if possible, offer a solution. Let me know by leaving a comment below.
Anyways, rant over, let’ move on …
Noindex the “Thank You for Downloading” (basically the lead magnet download button page) and nofollow all the links to that page. But that doesn’t noindex your PDF file and Google still might be able to index it.
So that doesn’t work either.
If only there was a way where you didn’t have to make too much of an effort and this problem could go away in just a couple of steps?
Fortunately there is 🙂
Here’s what you need to do:
Download the FULL, unlocked post
This means that Google will get the instruction to not index and PDF files on your website and also to drop them from its index soon after finding the files have been noindexed. You should expect Google to do this fairly quickly because Google takes noindex tags quite seriously.
To do this you need to modify the .htaccess file on your server. Don’t worry if it sounds too technical, I’m going to walk you through it right now:
The username and password here are the ones for your cPanel. If you are not sure what these are, ask your hosting company. If your host is not using cPanel, the rest of these steps don’t apply to you (and you should probably chose a host which provides cPanel which is a pretty good website management backed system.)
Simply login via cpanel
If you need help choosing a good hosting, I recommend SiteGround.
Anyways, if you aren’t using cPanel you should probably copy the URL of this page from your browser’s address bar and send it to your hosting company asking tech support to do the steps for you.
Moving on …
In your cPanel, locate File Manager and click on it.
Make sure you are in public_html folder
Select .htaccess from the file list, click on edit and add the following code to the top or bottom of your .htaccess file
<Files ~ ".pdf$">
Header set X-Robots-Tag "noindex, nofollow"
If you don't have a .htaccess file in the public_html folder, create it. If you have difficulty doing this, ask for help in the comments below.
Some hosts, like SiteGround, hide .htaccess from being viewed or edited in the method I just described. It's usually done for security reasons. If that's the case, just copy the code above and email the hosting support team to add it to your .htaccess file.
Take a deep breath and relax. It's done.
There’s no need to inform Google or any other search engines of the change. They will catch on fairly quickly.
Of course, this still means that your PDFs can be downloaded without signing up for your email list if someone knows the exact URL of the PDF file or the “Thank You for Downloading” page. For this reason I always recommend you noindex the “Thank You for Downloading” page. This will prevent the download page from being indexed by Google.
But if you use a plugin which auto generates the sitemaps for pages on your WordPress website, your download pages can still be found by anyone willing to visit yourwebsite.com/sitemap.xml and pulling up list of ALL your download pages.
Other than that, your PDFs can still be downloaded if someone shares the direct links to the PDFs or the download link pages.
And the possibility of that happening is quite slim and the recovery is fairly easy. You’ll just have to change the URLs of your download link pages and also of the file names of your PDFs.
But again, this would be a rare case. If you have been affected by someone sharing links in this manner, I’d like to help you out. Let me know by leaving a comment below.
So there’s you have it. This is how you protect your PDF files from showing up in Google’s index and why you should not use auto generated sitemaps on your website. To know my solution for sitemaps, check out this post.
Please log in again. The login page will open in a new window. After logging in you can close it and return to this page.