Home » SEO

How to prevent your pages from being indexed

27 January 2010 No Comment


No, it’s not a tutorial on the robots.txt file. (In fact, you can get rid of it entirely)

While most of us know how to edit the robots.txt file and use it to prevent certain pages on the site from being indexed, what we don’t realize is that they still show up in search engines. How? Because there is a difference between “being indexed” and “being listed.”

When Google or any search engine “indexes” a page, it means that the content of the site is downloaded by the search engine and added to it’s index (something like the “index” of a book here, with a pointer to a particular page number). While this would eventually lead to listing of the page in searches, your page might still be “listed” in searches even though it is not “indexed.”

The video below from Matt Cutts explains this further:

Why do you want your pages to be de-listed or not indexed?

Well, one thing is to prevent duplicate content from being indexed – such as the printer-friendly versions which you might maintain for large articles. There could also be pages which are available in PDF or Word document formats whose content can also be indexed by search engines nowadays. Also, if there are custom error pages that you use in case your visitor runs into an error on the site, you don’t want that to be indexed. There would also be other reasons which varies from person to person, site to site or reason to reason.

How to achieve this?

One way to do this is to LET the search engine index the page, but tell them NOT to list the page. This can be done using the “noindex” meta tag value to it, such as below:

<meta name=”robots” content=”noindex,nofollow”/>

There is an easier way to do this – unless you’re fine with adding the above to each and every page (or you can add it to your blog’s header page – which you have to be very careful with as it can hide your entire site from searches). You can use the X-Robots-Tag HTTP header variable in the .htaccess file in a folder to de-list it entirely. If mod_headers is enabled for your site (which you can check with your host), you can use the following line of code in your .htaccess file for the particular folder:

Header set X-Robots-Tag “noindex, nofollow”

More ways of doing this can be found from a good post from Antezeta here.

(via)

Leave your response!

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.