Google and other search engines can easily access, index and make your online PDFs searchable for anyone who looks for them (or the keywords they contain). This is bad news if you offer your lead magnets or ebooks in PDF format because anyone can access them if they aren’t protected.
Simply put, you can lose leads and sales if you don’t know how to protect PDFs on your site.
Today you will learn:
So you've put all this hard work into creating the PDF and someone can simply access it without buying anything or even giving you their email address?
But if you're like most people, you've simply uploaded the PDF and didn’t really bother hiding them. Because you figured, who can find them without knowing the exact URL to access, right?
And to be fair, you did nofollow all links leading to the download page and to the actual PDF itself, so it should be ok, right?
Google indexes and displays PDF files in search results because your PDF contains keywords that Google might find useful in search results. This means that even people who aren’t really looking to download your PDF without signing up can do so.
When they do, they will bypass your email signup and will get the benefits of reading your lead magnet.
It’s even worse if the PDF is paid. This means that people are getting access to what you are selling, for free!
And it’s not even difficult to search for PDFs on the internet:
Websites like PDFsearchengine provide an easy way to search for PDF files on the web.
You can even use Google’s own advanced search. All you have to do is to enter the domain name of the website and select file type as PDF. Voila, you can find all PDF files indexed by Google. Try it right now and see how many of your PDFs are visible to anyone willing run a simple Google search.
It’s not really ethical, but you can run such a search for your competitors PDFs to learn how to gain a competitive advantage!
(They might have read your PDFs already!)
OR you can even search the web for PDF files of topics you are interested in using a query like:
Download this post as, guess what?
A PDF file!
Should I still hide them?
Short answer: Yes.
I understand that sometimes you’ll want to give away a PDF file to your website visitors without requiring anything from them.
In that case too you should hide your PDFs because they are likely to contain more or less the same information which you have on other parts of your website. If you don’t, you can incur a duplicate content penalty from Google for the content contained in your PDFs.
And you know it’s not easy to go through all your uploaded PDFs and look for duplicate content.
[SEO Tips] - Did You Know You Can Get A Duplicate Content Penalty From PDF Files On Your Website? - #BloggingDoneBetter
When someone signs up to your newsletter, you ask them to go to their inbox and click the link in your email. Right?
The best thing to do here is to give them a link which gets them back to the download page on your site where they can download the PDF.
Why you ask?
Because it allows you to setup a download page where (apart from providing the PDF download link) you can do some pretty cool stuff like:
And if you don't have a download page setup for each of your lead magnets, you lose this opportunity.
PSST: Download pages on this site are customized too 🙂
Let's take a look at some solutions suggested elsewhere on the internet and see if they work:
It seems like a good idea if you don't want to get the visitor back your site after they sign up
The problems with this approach are:
You could host your PDFs on a third party service like Dropbox and simply provide the link in your email or on the lead magnet download page.
And no, that doesn’t work either. Unless you password protect your lead magnets or the folders they are in, Google will still index them. And they will probably benefit from the high PageRank of services like Dropbox and feature more prominently in the search results.
Hosting lead magnets on Dropbox isn't a good idea either!
That would be more harmful than just leaving the PDFs on your server. Don’t do it!
So now we’ve established that offsite storage of your PDF files is not an option.
This will remove the possibility for Google to index your PDF files. Of course, it already has indexed the PDF files on your website before you apply this procedure and there’s no telling when Google will update their index to reflect the changes you made. It could takes weeks or even months.
This also means that everyone who downloads the newly protected PDF file will encounter a password and if they are unable to access the file will leave a bad impression of you and your website. They might never return. I’ll explain this in a bit …
See, when I signup on a mailing list to download a lead magnet, I pretty much know what to expect.
I’m sure you must have been through this process so often that you don’t even think about the steps anymore. You just do it automatically. I don’t even read the instructions anymore because they are so standard.
Guess what, it’s the same for your website visitors. They don’t expect the lead magnets to be password protected. So even if you provide them with the password, they are likely not to find it.
The whole password issue can be dealt with by providing the password in big bold fonts on your download page and in your email. Great, right?
No wait, there’s another issue!
The whole point of downloading a PDF is that I can access it later. So 2 months later when I open your PDF lying on my desktop, what are the chances I will remember the password?
You see how password protecting your PDF lead magnets is not a good idea?
[SEO Tips] - Password Protecting Your Lead Magnets Is A Bad Idea! - BloggingDoneBetter
If you already use robots.txt then you might think it’s a good idea to use it to block access to your PDF file using the code:
Disallow: *.pdf # Block pdf files. Non-standard but works for major search engines.
Disallow: /pdfs/ # Block the /pdfs/directory if you keep all your PDFs in one folder
The problem is that this only prevents Google from accessing your PDF files but does nothing to remove them from its index or being listed in search results.
While looking for resources to help me create this post, I came across many websites which claimed this to be the perfect solution. Few are even listed in top 5 results of Google. Obviously they didn’t do their research and everyone who implemented their advice now thinks that their PDF files are being protected from Google search.
In a nutshell, this will only work for new PFDs that you upload and won’t do anything to remove the old PDFs from any search engine’s database.
Given that this solution is being touted as the best, all I want to say is to the tech & marketing bloggers is that they should do their research before posting such stuff. They aren’t just relying on wrong info themselves but also misleading anyone who reads that crappy advice.
Anyways, rant over, let’ move on …
Noindex the “Thank You for Downloading” (basically the lead magnet download button page) and nofollow all the links to that page. But that doesn’t noindex your PDF file and Google still might be able to index it.
So that doesn’t work either.
If only there was a way where you didn’t have to make too much of an effort and this problem could go away in just a couple of steps?
Fortunately there is 🙂
Here’s what you need to do:
Download this post as, guess what?
A PDF file!
This means that Google will get the instruction to not index and PDF files on your website and also to drop them from its index soon after finding the files have been noindexed. You should expect Google to do this fairly quickly because Google takes noindex tags quite seriously
To do this you need to modify the .htaccess file on your server. Don’t worry if it sounds too technical, I’m going to walk you through it right now:
The username and password here are the ones for your cPanel. If you are not sure what these are, ask your hosting company. If your host is not using cPanel, the rest of these steps don’t apply to you (and you should probably chose a host which provides cPanel which is a pretty good website management backed system.)
If you need help choosing a good hosting, I recommend SiteGround.
Anyways, if you aren’t using cPanel you should probably copy the URL of this page from your browser’s address bar and send it to your hosting company asking tech support to do the steps for you.
Moving on …
In your cPanel, locate File Manager and click on it.
Select .htaccess from the file list, click on edit and add the following code to the top or bottom of your .htaccess file
<Files ~ ".pdf$">
Header set X-Robots-Tag "noindex, nofollow"
If you don't have a .htaccess file in the public_html folder, create it. If you have difficulty doing this, ask for help in the comments below.
To block other file types: You can repeat this code as many times as you like and all you need to do is replace the .pdf part with another file type.
Blocking .mp4 files as an example:
<Files ~ ".mp4$">
Header set X-Robots-Tag "noindex, nofollow"
Some hosts hide .htaccess from being viewed or edited in the method I just described. It's usually done for security reasons. If that's the case, just copy the code above and email the hosting support team to add it to your .htaccess file.
Take a deep breath and relax. It's done.
There’s no need to inform Google or any other search engines of the change. They will catch on fairly quickly.
Of course, this still means that your PDFs can be downloaded without signing up for your email list if someone knows the exact URL of the PDF file or the “Thank You for Downloading” page. For this reason I always recommend you noindex the “Thank You for Downloading” page. This will prevent the download page from being indexed by Google.
So there’s you have it. This is how you protect your PDF files from showing up in Google’s index and why you should not use auto generated sitemaps on your website.
Please log in again. The login page will open in a new tab. After logging in you can close it and return to this page.