Everyone in the inbound marketing industry understands that Google does not like duplicate content and it seems like a simple situation to avoid. Just do not copy someone else’s content onto your website or Google will find it and you will be penalized. But there are certain situations where it’s not so black and white.
Take for example this week’s question on Ask an SEO Expert. This user would like to know how an ecommerce site could include a PDF of product information from the manufacturer, which could be the same PDF for several different products across several different sites, and not get dinged for duplicate content. This type of content happens to actually be very useful for the customer, so how can an ecommerce site communicate that to Google and avoid being penalized?
In this week’s Ask An SEO Expert feature, Jesse Laffen offers several tips on how you can provide relevant and useful information for the customer and avoid hurting your search engine optimization efforts.
This question is a really good one, actually. I like this one a lot. Can adding a PDF instructional manual – that the manufacturer provides – adding that PDF to a product page, even possibly across multiple products of the same brand, have a significant effect on your SEO for that page or pages?
For example, many websites that sell the same product have the same PDF’s from the manufacturer. Does it count as duplicate content even though it helps customers? So, there’s two great considerations here. The first one is it helps customers, right? A lot of this information is really, really good. The second one is it could be kind of be seen as duplicate content, right?
Here’s the exact scenario that we’re talking about here. You’re a reseller, and this is the manufacturer and that’s their warehouse as you can see. Great artistic warehouse. They have a PDF and they want to hand it out to all the people who are reselling their products. So, I take this PDF right here, and I walk over to my website, and I slot it in right here under one of these pages.
Now, all of a sudden Google walks by and it says, “Oh, okay. I see that you can I actually mitigate or how can I show this really useful information? There are actually a couple ways. The first one is that you can replicate it in HTML on another part of your website. So, benefits there are it’s crawlable, right? You can use a canonical tag to kind of point it to another page on your site or to the actual manufacturer’s, assuming they are not really selling the product. They’re just providing it to you to sell.
That can create additional problems, though, because obviously we’re trying to avoid duplicate content. Now, I’ve taken this PDF here and I’ve said, “Oh, not only is it right here on my site, but that text also lives over here.” So that’s not entirely the most useful way to do that.
The other way that you can do this is you can go into your robots.txt file and you can actually disallow the crawling of that PDF. Again the problem with that though is that you want the search engine to understand that this is a valuable resource, right? “Whenever you see this product anywhere else on the web, you’re seeing this PDF, but we’re blocking it from you.” That doesn’t really make any sense.
So, I did a little digging, and I think the solution that we would prefer is to actually use a canonical tag on the PDF file itself. “Wait a minute,” you might be saying. “I thought canonical tags were an HTML element. This is a PDF file, how do I do that?” Turns out, Google does support using a link in the HTTP header of the request on the PDF.
So if you have control over your own servers, and if you can kind of write some of those rules and can control those HTTP requests coming in to your own server, then you can actually write something that says, “Hey, this PDF file,” or even this word document, this excel spreadsheet, almost any file type, images even. “This is the original source for that.
In the case where this PDF is coming from the manufacturer, I’d actually recommend saying, “Yeah, the canonical source of this content is over here and it belongs with this manufacturer.” Or if it’s something that you produce, maybe you can point it back to another page on your site, too, if that’s a problem.
But I think that the HTTP link header canonicalizing this file right here, is the right thing to do, just to show a search engine that, “This is useful to our end user. That’s why it’s here. But we’re really not really interested in trying to go out and kind of steal somebody else’s copy and put it on our own page.” have that PDF here. I also see it from the manufacturer site here. Then I see it on all of these multiple other competitors who are selling the same product over here. I don’t like duplicate content because I’m Google and that’s just my stance on duplicate content.