Robots.txt

robots.txt is a file that you can place in the root of your web server in order to let search engines know certain things about your site, for example how often they should revisit your site to adapt their search indexes.

Usage

 * Create a file robots.txt
 * This file must be accessible via HTTP on the local URL "/robots.txt"

Format


User-agent:  Disallow:    Request-rate: Visit-time:  Crawl-delay: 

Examples
The examples below are taken from robotstxt.org

The following example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/" or "/tmp/", or /foo.html:


 * robots.txt for http://www.example.com/

User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space Disallow: /tmp/           # these will soon disappear Disallow: /foo.html Request-rate: 1/5         # maximum rate is one page every 5 seconds Visit-time: 1000-1200     # only visit between 10:00 and 12:00 UTC (GMT) Crawl-delay: 10           # 10 seconds to wait between successive requests to the same server

This example "/robots.txt" file specifies that no robots should visit any URL starting with "/cyberworld/map/", except the robot called "cybermapper":


 * robots.txt for http://www.example.com/

User-agent: * Disallow: /cyberworld/map/ # This is an infinite virtual URL space

User-agent: cybermapper Disallow:
 * Cybermapper knows where to go.

This example indicates that no robots should visit this site further:

User-agent: * Disallow: /
 * 1) go away

Efficiency

 * The protocol is purely advisory.