TechTutorials - Free Computer Tutorials  







Controlling Spiders With Your Robots.txt File 
 


Added: 05/04/2007, Hits: 3,269, Rating: 0, Comments: 0, Votes: 0
Add To Favorites | Comment on this article
When optimizing your web site, most webmasters don’t consider using the robots.txt file. This is a very important file for your site. It lets the spiders and crawlers know what they can and can not index. This is helpful in keeping them out of folders that you do not want indexed like the admin or stats folder. The robots.txt file goes in the root directory of your website.

Here is a list of variables that you can include in a robots.txt file and there meaning:
  • User-agent: In this field you can specify a specific robot to describe access policy for or a “*” for all robots. See the examples below.

  • Disallow: In the field you specify the files and folders not to include in the crawl.

  • The # represents comments

Here are some examples of a robots.txt file:

Code :

User-agent: *
Disallow:

The above would let all spiders index all content. Here is another:
Code :

User-agent: *
Disallow: /cgi-bin/

The above would block all spiders from indexing the cgi-bin directory.
Code :

User-agent: googlebot
Disallow:

User-agent: *
Disallow: /admin.php
Disallow: /cgi-bin/
Disallow: /admin/
Disallow: /stats/

In the above example googlebot can index everything while all other spiders can not index admin.php, cgi-bin, admin, and stats directory. Notice that you can block single files like admin.php.





Comments (0)

Be the first to comment on this article


Related Items








7 Seconds Resources, Inc.




IT Showcase